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Rome  Laboratory  (RL)  has  been  performing  Speech  Processing  Research  and  Development 
since  the  early  sixties.  Areas  of  research  include  Speech  Recognition,  Speaker 
Identification,  Language  Identification,  Keyword  Recognition,  Platform  Identification, 
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set  of  externals  for  the  Testbed. 
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PREFACE 


The  Rome  Laboratory  (RL)  Externol/Internal  (E/I)  Dota  Fusion  Testbed  was  a 
product  of  on  RL  research  contract  with  HRB  Systems.  This  preface  describes  the  job  of 
an  RL  progrom  manager,  the  requirements  for  progre-ssion  of  an  RL  program  manager 
into  upper  monogement,  and  how  the  courses  associated  with  this  MS  in  Business 

ti3V0  benefrtted  the  E/I  program  and  this  thesis. 

Within  Rome  Laboratory,  there  are  a  large  number  of  funded  and  cooperative 
research  projects  with  industry  or  universities.  The  E/1  Data  Fusion  Testbed  contract.  The 
subject  of  this  thesis,  is  an  example. 

An  RL  engineer's  job  os  program  manager  on  these  research  projects  is  to 
perform  two  functions.  These  functions  are  to  manage  the  contract  technically  and 
administrotively. 

Administratively  these  projects  require  monitoring  of  schedule  and  cost.  HRB 
provided  monthly  status  reports  containing  a  current,  detailed  list  of  contractual 
milestones  and  actual  vs.  planned  expenditures.  Typically,  the  schedule  is  provided  os 
o  Gantt  chart  and  the  funds  portion  contains  expenditures  for  the  next  three  months  in 
a  Government  standard  Contract  Funds  Status  Report.  SUNY  Graduate  Classes  that 
oliowed  for  a  better  understanding  of  these  items  included  Project  Management 
(Meredith  ond  Mantel,  1989)  where  Gantt  charts  were  covered  in  detoil  and 
Accounting  (Weygondt  et  al,  1990)  which  dealt  with  all  aspects  of  direct  and  indirect 
costs. 

Technically,  it  is  necessary  to  understand  and  work  as  a  team  with  HRB  to 
develop  on  opprooch  to  solve  the  Government's  research  and  development  needs. 
Project  Management  oliowed  for  the  development  of  a  project  which  simuloted  a 
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manager's  performance  on  o  contract  from  conception  to  delivery.  This  activity 
allov>/ed  for  a  better  understonding  of  the  tasks  the  HRB  progrom  manager  performs  on 
an  RL  contract.  Other  essentiol  skills  of  conflict  resolution  when  contractual  problems 
arise,  effective  communication  skills  to  deal  with  a  controctor,  and  group  behavior 
were  dealt  with  in  Organizational  Behavior  (Robbins,  1991). 

Technicolly,  research  associated  with  writing  a  paper  in  the  Research  Seminar 
(Parker,  1993)  and  its  continuation  for  this  thesis  benefitted  the  E/l  Dota  Fusion  Testbed 
contract.  While  planning  and  performing  experiments,  several  things  were  uncovered 
obout  E/l  that  were  not  understood  and  required  clorificotion/expianation  by  the 
contractor  to  RL.  These  ore  further  described  in  Chapters  3  and  5  of  this  thesis.  Also, 
while  thinking  obout  how  to  test  E/l  (see  Chopter  4),  these  concepts  were  passed  to  the 
contractor  before  submission  of  the  required  test  plan.  In  addition,  while  documenting 
the  individuol  components  of  E/l  (see  Chapter  2),  several  pieces  were  found  to  be 
ambiguously  stated  and  not  v/ell  understood  and  generated  questions  thot  were 
answered  by  the  contractor.  Finally,  while  planning  experiments,  different  summary 
statistics  were  needed.  These  needs  were  discussed  with  the  contractor  and  later 
added  to  the  Testbed. 

Though  oriented  around  business  concepts,  summary  statistics  used  during  this 
thesis  and  discussed  in  the  Research  Seminar  paper  were  reviewed  during  a  class  in 
Statistics  (Moore,  1989).  These  concepts  included  mean,  standard  deviation,  median, 
mode  ond  time  series  onalysis  techniques. 

Another  important  RL  program  manager's  job  is  the  technical  and  cost 
evoluotion  of  competing  controctor  responses  to  an  RL  request  for  proposal.  This 
involves  a  certain  level  of  technical  expertise  to  understand  the  controctor's  proposed 
approach.  Data  Communications  (Silver  and  Silver,  1987)  enabled  a  better  technical 
understanding  of  modems  and  communications  channels,  both  related  to  speech 
processing.  In  addition,  a  class  in  Database  Management  (Stamper  and  Price,  1990) 
allowed  a  better  understonding  of  a  vital  part  of  speech  processing  since  RL  is  in  the 
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process  of  creoting  two  unique  datobases  for  research  ond  development  (Lombert 
et  ol,  1993). 

Frequently,  work  breakdown  structures,  Gantt  charts.  Pert  chorts  and  other 
management  items  are  also  included  in  a  technical  proposal.  A  general  organizational 
chart  is  often  contained  along  with  a  summary  of  business  goals  of  the  company. 
Aspects  of  the  Project  Management  ond  Organizational  Behavior  dosses  contributed  to 
o  better  understanding  of  these  portions  of  the  proposal.  If  ovailoble,  corporate 
financial  statements  could  also  be  examined  to  determine  if  the  company  is  stoble 
enough  to  be  awarded  the  contract.  Elements  of  Managerial  Finance  (Gitman,  1991) 
and  Accounting  would  aid  in  this  analysis. 

In  addition,  the  proposed  costs  are  evaluated  from  a  technicol  viewpoint  for 
adequacy  and  necessity.  Though  RL  has  their  own  finoncial  and  cost  analysis 
personnel,  a  certain  high  level  understanding  of  Accounting,  Financial  Management 
(Anthony  et  al,  1992)  and  Managerial  Finance  provides  for  a  better  understanding  of 
direct  costs,  indirect  costs,  cost  of  money,  profit,  loss,  labor  rates  and  categories, 
overheod  costs,  travel  expenses,  etc. 

UPPER  MANAGEMENT  AT  RL 

Migration  from  program  management  to  RL  upper  management  has  two 
rfiffsirent  tracks:  technical  and  .managerial . 

Both  tracks  require  a  minimum  of  12  business  credits  for  Government-wide 
certification  ot  the  correct  level  for  these  positions.  Classes  in  Accounting,  Finance, 
Macroeconomics,  Microeconomics,  Business  Law,  Marketing,  Organizational  Behavior, 
and  Statistics  count  towards  this  requirement.  This  degree  has  provided  the  classes  to 
fulfill  this  requirement. 

In  the  technical  track,  RL  personnel  desire  to  achieve  the  title  of  group  leader  or 
technical  director.  These  individuals  strive  for  world  class  technical  stature  in  on  area, 
with  some  small  sense  of  business  skills.  The  computer  science  electives  are  not  geared 
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for  personnel  in  this  trock  since  they  are  taught  with  a  business  orientotion. 

In  the  manageriol  track,  a  high-level,  non-detailed  understanding  of  technical 
aspects  are  required  with  a  moxifnum  of  manogerial  skills.  This  trock  involves 
management  of  people,  not  technology.  Program  and  project  managers  prepare 
budgets  and  track  financial  obligations  and  expenditures.  The  budget  process  wos 
described  in  Accounting  and  Financial  Management.  Supen/isors  perform  performance 
appraisals,  deal  with  upper  management  and  multiple  personalities,  make  managerial 
decisions,  hire  personnel,  and  opprove  decisions  of  program  managers.  These  areas 

were  dealt  with  in  Organizational  Behavior. 

Finally,  in  the  new  President  Clinton  initiative,  developing  civilian  uses  of  militory 
technology  is  extremely  important.  In  add'rtion,  the  locol  RL  politicians  ore  attempting  to 
employ  this  strotegy  to  transfer  os  much  RL  technology  to  the  loco!  community  as 
possible  to  minimize  the  chances  of  the  laboratory  moving  to  other  locations.  As  a 
result,  a  background  in  Marketing  (Boone  and  Kurtz,  1989)  strotegies  and  techniques  is 
beneficial  for  RL  engineers. 
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Chapter  1 

INTRODUCTION 

Rome  Laboratory  (RL)  has  been  performing  speech  processing  research  and 
development  since  the  early  sixties.  Areas  of  research  include  speech  recognition, 
speoker  identification,  language  identification,  keyword  recognition,  platform 
identificotion,  ond  external/internal  (E/I)  data  fusion. 

Data  fusion  for  this  thesis  refers  to  the  merging  of  information  produced  by  three 
independent  data  sources:  externals,  internals,  and  Electronic  Intelligence  (ELINT) 
information,  The  RL  E/I  Data  Fusion  Testbed  is  a  system  which  allows  for  experiments  to 
be  performed  to  merge  and  correlate  these  data  sources.  These  terms  are  further 
defined  in  Section  1.1. 

This  objective  of  this  thesis  is  to  perform  an  experiment  examining  the  weighting  of 
decisions  made  by  the  external,  internal,  and  ELINT  processes.  It  examines  the 
relationships  between  decisions  made  by  these  processes  and  gives  some  relative 
importance  to  these  decisions.  It  was  chosen  due  to  its  ease  of  use,  and  relevance  to 
both  the  speech  processing  and  operational  communities.  This  experiment  is  described 
in  detail,  including  its  design,  analysis  and  conclusions. 

This  thesis  also  describes  the  history  of  the  Testbed  development  and  its  general 
approach  to  data  fusion.  The  thesis  also  details  a  preliminary  experiment  to  get  more 
familiar  with  the  capabilities  of  the  Testbed.  Finally,  the  thesis  describes  many  other 
experiments  which  could  be  run  with  the  Testbed. 

1.1  DEFINITION  OF  TERMS 

Before  describing  the  E/i  Data  Fusion  Testbed,  it  is  necessary  to  understand  some 
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of  the  terms  which  will  be  used  throughout  this  thesis.  This  section  provides  a  definition 
of  these  terms  (Shirey  and  Morgan,  1993). 

Internols  are  processes  that  analyze  an  audio  signal  and  result  in  a  series  of 
decisions  and  scores.  Externals  are  characteristics  of  a  voice  communication  other  than 
the  audio  signal.  Since  the  intent  of  this  thesis  is  to  perform  experiments  with  the  E/I 
Testbed,  the  technical  details  and  performance  levels  of  the  individual  externals  and 
internals  are  not  described  in  any  detail. 

1.1.1  INTERNALS 

The  internal  speech  processing  functions  are  the  RL  speaker  identification, 
language  identification,  platform  identification,  and  keyword  recognition  systems,  Each 
system  makes  decisions  based  on  characteristics  extracted  from  the  audio  signal  and  is 
in  a  different  stage  of  research  and  development.  Detoils  on  the  state  of  the  art  of 
these  processes  can  be  obtained  from  the  yearly  Proceedings  of  the  IEEE  Acoustics, 
Speech,  and  Signal  Processing  Conference. 

Speoker  identificotion  is  the  uncooperative  recognition  of  a  speaker  based  on 
features  extracted  from  the  speech  of  an  audio  transmission.  This  technology  is  being 
researched  by  RL  independent  of  language  and  the  words  that  are  spoken. 

Language  identificotion  is  the  uncooperative  recognition  of  the  language  based 
on  features  extracted  from  the  speech  of  an  audio  communication.  This  technology  is 
being  researched  by  RL  independent  of  speaker  ond  the  words  that  are  spoken. 

Platform  identificotion  is  the  function  of  identifying  the  platform  of  a  speaker 
based  on  features  extracted  from  the  background  noise  of  an  audio  communication. 

For  our  purposes,  plotform  can  refer  to  differentiating  a  helicopter  from  an  airplane  or 
specifically  determining  if  the  platform  is  an  F- 16  or  a  B-52  aircroft. 

Keyword  recognition  is  the  recognition  of  words  and  phrases  in  a  communication. 
This  can  be  performed  in  a  speoker  dependent  fashion,  where  every  word  to  be 
searched  for  is  spoken  at  leost  once  by  a  specific  speaker.  Speaker  independent  refers 
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to  the  case  where  a  general  speech  model  is  formed  for  each  word  in  the  database. 
The  tie-in  between  E/i  and  keyword  recognition  is  illustrated  in  Section  1.2. 

1.1.2  EXTERNALS  AND  ELINT 


Table  1  gives  a  hypothetical  example  that  illustrates  the  externals.  This  table 
represents  short  news  reports  on  President  Clinton  obtained  from  several  radio  station 
listeners  at  various  times  during  the  some  day. 


FREQ. 

MOD. 

LOC. 

TIME 

SPEAKER 

SHORT  SUMMARY 

1310 

AM 

ROME 

1100 

SPKl 

LOCAL  NEWS  REPORT  ON 

CLINTON  RE:  ECONOMY 

1350 

AM 

NY 

1110 

RUSH  LIMBAUGH 

EDITORIAL  RE:  CLINTON'S 

VIEWS  ON  HEALTH  CARE 

103.4 

FM 

NY 

1500 

LARRY  KING 

DISCUSSION  ON  CLINTON'S 

DEFENSE  STRATEGY 

105.3 

FM 

BOS 

1600 

DAN  RATHER 

DISCUSSION  ON  ECONOMY 

TABLET;  EXTERNAL/INTERNAL  EXAMPLE 


The  externals  in  the  Testbed  are  frequency,  modulation  type,  radio  station 
locotion,  radio  type,  and  direction  with  respect  to  the  listener.  In  Table  1,  the  radio 
station's  frequency  (FREQ.),  modulation  type  (MOD.),  and  location  (LOC.)  are  recorded 
based  on  the  position  of  the  radio  dial  where  the  news  report  was  found  (e.g.,  Rome 
station  WTLB  1310  AM).  The  other  two  externals  refer  to  the  direction  (DIR.)  of  the  station 
with  respect  to  the  listener  measured  in  degrees  from  true  North  and  radio  type  (RT.  - 
Walkman,  stereo,  etc.). 

External  systems  are  not  totally  accurate  when  measuring  the  frequency  and 


direction  parameters.  In  the  E/I  Data  Fusion  Testbed,  these  externals  are  used  subject  to 
error  tolerances.  For  example,  occasionally  the  radio  station  located  at  frequency 


103.4  MHz  can  be  heard  at  103.5  MHz. 

Electronic  Intelligence  (ELINT)  reports  are  also  included  in  the  Testbed.  These 
separate  reparts  contain  the  type  and  location  of  the  desired  radio  stotions  and  ore 
determined  by  speciolly  designed  ELINT  systems.  The  calculation  of  the  ELINT  location 
parameter  is  also  subject  to  error  tolerances. 

1.1.3  OTHER  DEFINITIONS 

To  insure  less  duplication  of  Table  1  nev/s  reports  due  to  multiple  listeners  scanning 
the  radio  at  the  some  time,  o  routing  toble  could  be  developed.  This  table  specifically 
assigns  speakers,  languages,  ond  news  events  to  listeners  for  automotic  radio  station 
search.  For  instance,  if  the  routing  table  for  listener  one  contained  the  speech  of  Rush 
Limbough,  listener  two  could  skip  that  broadcast  outomatically  and  go  on  to  some 
other  news  report.  In  the  E/I  Data  Fusion  Testbed,  these  tobies  con  be  easily  created 
and  modified  as  described  in  Section  5.3.6  and  are  the  source  of  the  routing  statistics 
described  in  Section  2,3. 

RL  technology  is  being  developed  for  tactical  speech  processing  applications. 

This  means  that  the  decisions  made  by  the  speech  processing  systems  must  be 
determined  in  as  little  time  as  possible.  Strategic  applications  have  the  luxury  of  longer 
decision  and  training  times  since  there  are  less  rigorous  time  constraints  on  the  results. 

Finally,  the  words  activity  and  scenorio  must  be  defined.  In  Air  Force  (AF)  terms, 
an  activity  is  on  event  for  which  a  report  is  created.  A  scenorio  is  the  sequence  of 
events  vzhich  are  reported.  A  typical  AF  scenario  is  composed  of  severol  activities. 
Table  1  is  also  called  a  scenario  report  and  the  short  summaries  are  representative  of 
the  activity. 

1.2  EXTERNAL/INTERNAL  MOTIVATION 

Woltz  and  Llinos  state  that  the  objective  of  data  fusion  "is  to  derive  more 
information  through  combining,  than  is  present  in  ony  individual  element  of  input  data" 
(Waltz  and  Llinos,  1990).  The  concept  of  E/I  date  fusion  evolves  from  this  definition.  This 
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section  uses  information  extracted  from  Hamer  and  Foil  (1987)  and  Shirey  and 
Morgan  (1993). 

Considering  only  the  internals,  interrelationships  between  speaker  identification 
and  keyword  recognition  can  be  demonstrated  since  speaker  dependent  keyword 
recognition  produces  better  results  than  speaker  independent  recognition.  Thus,  if  the 
speaker  is  accurately  recognized  via  speaker  identification,  then  speaker  dependent 
recognition  can  be  performed  by  loading  in  the  proper  speaker's  keyword  training 
templates. 

Further,  people  speak  a  limited  number  of  languages.  Thus,  if  accurate 
identification  of  the  speaker  can  be  obtained,  then  these  internal-internal 
interrelationships  (contained  in  an  associations  table)  can  be  used  to  verify  the  results  of 
the  language  identification  system.  This  illustrates  the  data  fusion  relationships  between 
language  identification  and  speaker  identification. 

To  illustrate  the  concept  of  externals'  providing  information  on  speakers  of  interest, 
observe  the  scenario  report  in  Table  1.  In  this  report,  interrelationships  are  built  between 
externals  and  speakers.  Similorly,  these  interrelationships  could  be  developed  between 
externals,  languages,  and  platforms.  These  interrelationships,  expressed  in  an  external 
relations  toble,  provide  the  required  information  to  perform  the  correlation  of  externals 
with  internals.  Similarly,  FLINT  relations  tables  can  be  developed  to  associate  FLINT  data 
with  internals.  The  creation  and  use  of  these  relations  tables  is  described  in  Section  2.2. 

1.3  EXTERNAL/INTERNAL  HISTORY 

The  history  of  the  RL  E/I  program  parallels  the  USAF  programmatic  cycle  for 
research  and  development.  This  cycle  begins  with  a  6.1  basic  research  stage  in  which 
theoretical  research  without  application  is  performed.  In  the  6.2  Exploratory 
Development  stage,  an  algorithm  is  developed  on  some  computer  for  experimentation, 
test,  and  evaluation  without  regard  to  performance,  speed,  or  any  guarantee  of 
success.  In  the  6.3A  stage,  on  application  component  is  added  to  the  6.2  stage  for  this 
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experimentation,  bridging  the  gap  between  6.2  and  6.3  research.  In  the  6.3B 
Advanced  Development  stage,  an  operational  system  is  designed,  developed, 
implemented,  and  fabricated  around  an  application  for  operational  test,  evaluation, 
and  deployment.  The  final  stage,  6.4,  represents  Production  and  Deployment  in  which 
several  systems  are  put  into  the  field  for  operational  use. 

The  rest  of  this  section  generally  illustrates  the  chronological  history  and 
development  stages  of  the  RL  sponsored  E/I  research  program.  Basic  research  (6.1) 
work  in  data  fusion  has  evolved  through  other  areas  of  Rome  Laboratory,  academia, 
and  other  government  and  industrial  sponsorship.  No  sponsorship  has  come  directly 
from  the  RL  speech  processing  group. 

1.3.1  EXTERNAL/INTERNAL  6.1  PROGRAM 

The  initial  RL  6.1  E/I  data  fusion  program,  completed  in  1985,  performed  a 
study  and  prepared  a  report  on  the  data  fusion  of  externals  and  internals.  In  the 
study,  (Hamer  and  Foil,  1987)  several  externals  were  identified  and  defined,  and  an 
architecture  was  developed  for  E/I  data  fusion.  Figure  1  illustrates  this  architecture. 

Figure  1  shows  digitized  audio  input  to  the  speaker  and  language  recognition 
systems,  each  having  their  own  speech  based  training  models.  These  systems  output 
a  series  of  decisions  and  scores.  In  parallel  with  these  internal  processes,  lists  of 
externals  are  input  to  speaker  and  language  ranking  and  likelihood  assignment 
processes,  based  around  trained  external  models.  Results  of  these  external  and 
internal  processes  are  fused  by  speaker  and  language  decision  logic  with  an 
“educated  result"  produced.  The  internal-internal  interaction  between  the  language 
decision  logic  and  speaker  decision  logic  is  also  shown.  Following  the  "educated" 
speaker  and  language  decision,  this  information  along  with  the  external  data  is  input 
into  the  keyword  recognition  system  to  select  the  appropriate  word  templates.  This 
architecture  also  calls  for  the  additional 
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Figure  1;  External/Internal  Configuration 
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keyword/language  match  to  insure  the  language  of  the  keyword  agrees  with  the 
language  identification  result.  The  flow  of  this  architecture  is  being  followed  today  and 
Is  further  described  In  Chopter  2  of  this  thesis. 

1.3.2  EXTERNAL/INTERNAL  6.2  AND  6.3A  STAGES 

Since  the  initial  study,  three  Exploratory  Development  Models  (EDM)  have  been 
developed.  All  three  have  transitioned  the  6. 1  work  previously  described  into  the  6.2 
and  6.3A  stages. 

The  initial  E/I  EDM,  (Woodsum  and  Harms,  1987)  completed  in  1987,  was 
developed  on  o  VAX  1 1/780,  with  a  user  interface  and  simulation  developed  using  the 
Digital  Equipment  Corporation  Forms  Management  System  (FMS). 

Two  of  the  deficiencies  of  this  EDM  were  the  inflexibility  of  this  proprietary  FMS 
packoge  and  the  difficulty  to  add,  modify,  and  change  externals  and  internals.  Three 
data  fusion  strategies  included  were  Bayesian,  addition  of  external  and  internal  scores, 
ond  addition  of  external  and  internal  ranks. 

To  overcome  the  problems  with  the  first  E/I  EDM,  a  second  research  program  was 
funded  and  completed  in  late  1989  (George,  1990).  This  EDM  was  developed  on  a  SUN 
workstation,  with  Inference  Corporation's  Automated  Reasoning  Tool  (ART)  expert 
system  shell  performing  rule  based  data  fusion. 

An  operotional  database  was  created  for  realistic  simulation.  This  database  was 
created  from  twelve  AF  activities  which  were  organized  into  nine  different  scenarios.  All 
scenarios  have  a  complete  record  of  the  conversations  that  occurred,  v/ith  the 
speakers,  languages,  and  externals  accurately  denoted  and  stored  as  ASCII  scenario 
ground  truth  files.  These  files  are  the  source  of  the  simulated  inputs  described  in  Section 
2. 1  of  this  thesis.  Finally,  oudio  tapes  were  produced  which  could  be  input  into  the 
actual  speaker  identification  and  language  identification  systems  to  create  live  internal 
results  rather  than  the  simulations. 


A  simulation  was  created  of  the  external,  speaker  identification  and  language 
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identification  functions  on  on  INTEL  310AP  microprocessor.  This  simulation  was  created 
to  allow  the  experimenter  to  select  and  vary  score  sequences  of  externals,  speakers, 
and  languages  in  a  fashion  similar  to  the  actual  external  and  internal  systems.  The 
simulation  allowed  for  the  selection  of  scenario  groups  (1,2,3),  (4,5,6),  or  (7,8,9). 

Following  selection  of  the  scenario  group,  all  correct  speakers  and  languages  known  by 
the  software  were  assigned  a  confidence  score  of  90.  Ail  other  speakers  and 
languages  in  the  scenarios  were  assigned  an  identical  confidence  score  of  50.  The 
experimenter  could  then  tediously  go  through  one  by  one  decreasing  the  correct 
internal  score  and/or  increasing  the  incorrect  ones  to  simulate  errors  made  by  the 
internals.  The  atternative  to  this  tedious  process,  is  to  enter  numerical  values  for 
distortion  which  randomly  adds  or  subtracts  an  amount  from  each  score  in  a  fashion 
controlled  by  the  software  in  the  INTEL  This  concept  is  the  basis  for  the  simulation  which 
is  described  in  Section  2.1  of  this  thesis.  Externals  could  be  altered  in  the  same  manner. 

Fusion  was  performed  using  a  rule  based  expert  system,  coded  using  the  LISP 
based  ART  shell,  The  only  internal  analyzed  was  speaker  identification  since  no  rules 
were  written  for  language  identification.  In  this  system,  the  data  fusion  algorithm 
checked  the  mognitude  of  the  highest  ranking  speaker  identification  confidence  score, 
If  this  score  was  higher  than  a  defined  threshold,  it  was  reported  without  performing  any 
fusion.  If  the  score  was  less  than  the  threshold,  then  the  system  performed  rule  based 
data  fusion. 

The  only  external  used  in  this  data  fusion  process  was  the  location  of  the  speoker 
measured  in  terms  of  latitude  and  longitude.  The  rules  checked  to  see  which  speaker 
was  closest  in  distance  to  the  simulated  location.  The  system  kept  track  of  the  average 
location  of  each  speaker  in  the  database  and  also  measured  the  variance.  If  the 
variance  was  above  o  threshold,  the  speaker  was  designated  as  moving  and 
eliminated  from  consideration. 

The  algorithm  improperly  assumed  that  since  the  top  speaker  score  was  less  than 
the  threshold,  it  was  not  correct.  As  a  result,  the  computer  software  automatically 
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eliminated  the  top  scoring  speaker  from  consideration  by  the  data  fusion  algorithm. 

There  were  two  major  difficulties  with  this  data  fusion  algorithm.  First,  the  top 
speaker  was  automatically  accepted  when  its  score  exceeded  the  threshold.  What 
if  that  speaker  was  not  likely  to  be  at  the  location?  Second,  the  logic  to 
automatically  eliminate  the  top  scoring  speaker  from  consideration  was  flawed. 

What  if  the  speaker  was  closest  in  distance  to  the  reported  location?  Overall 
comparison  of  speaker  identification  performance  before  data  fusion  versus  after 
resulted  in  a  decrease  in  performance  due  to  these  difficulties. 

Another  flaw  of  this  EDM  was  that  all  software  was  written  using  the  LISP 
based  ART  shell.  This  included  the  data  fusion  algorithm,  the  displays,  the  record 
keeping  routines  and  the  SUN  communication  protocols  with  the  INTEL.  As  a  result,  the 
system  collected  an  enormous  amount  of  LISP  generated  "garbage"  as  it  ran.  After 
less  than  twenty  minutes  of  processing,  the  system  stopped  for  "garbage  collection". 
This  characteristic  made  the  system  useless. 

Finally,  the  EDM  only  considered  the  speaker  internal  and  the  location  external 
in  its  data  fusion  process.  A  more  elaborate  rule  based  expert  system  would  be 
required  to  make  this  system  operationally  useful.  These  deficiencies  were  corrected 
in  the  third  E/I  EDM. 
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Chapter  2 

EXTERNAL/INTERNAL  DATA  FUSION  TESTBED 

This  chapter  explains  in  detail  the  current  RL  E/I  Data  Fusion  Testbed,  the  third  E/I 
EDM  (Shlrey  and  Morgan,  1993),  When  new  terms  are  defined,  they  are  underlined,  as 
in  Chapter  1. 

Figure  2  is  provided  as  a  high  level  design  for  creating  and  running  experiments 
on  the  Testbed.  This  high  level  design  begins  with  the  selection  of  scenario  and 
simulation  parameters.  The  External/Internal/ELINT  simulation  algorithm  then  turns  the 
inputs  into  a  sequence  of  LSAP  (language,  speaker,  activity,  and  platform)  decisions. 
The  relations  table,  which  contain  the  external-internal  and  ELINT-internal 
interrelationships,  are  input  to  their  respective  Processor  algorithm  to  obtain  LSAP 
decisions  based  on  these  inputs.  Finally,  both  lists  of  LSAP  decisions  are  merged  and 
correlated  by  the  E/I  Data  Fusion  algorithm  to  produce  LSAP  decisions  which  are  stored 
in  the  system  audit  file  olong  with  other  intermediate  outputs  illustrated  in  Figure  2. 
Another  input  to  the  Data  Fusion  algorithm  is  the  internal-internal  interrelationships  in  the 
associations  table.  Each  of  these  parts  of  the  Testbed  are  described  in  detail  in  the 
sections  which  follow. 

Experiments  on  this  Testbed  can  be  conducted  to  change  the  inputs  (outside  the 
boxes  in  Figure  2)  and  determine  their  effect  on  the  LSAP  decisions  recorded  in  the 
system  audit  file.  For  example,  an  experimenter  can  select  different  scenarios  and 
different  values  for  the  E/I  simulations  and  evaluate  statistics  which  can  be  obtained 
from  results  in  the  audit  file.  A  separate  experiment  could  process  different  inputs 
through  the  four  data  fusion  algorithms  to  assess  the  performance  of  these  algorithms. 

This  Testbed  has  the  flexibility  to  run  a  number  of  different  experiments.  A  short 
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experiment  is  described  in  Chapter  3  of  this  thesis.  A  summary  of  ten  possible 
experiments  is  given  in  Chapter  4,  and  one  particular  experiment  is  designed  and 
performed  in  Chapter  5. 


SCENARIO 

SELECTION 


RELATIONS  TABLES 


AUDIT 

FILE 


ASSOCIATIONS  TABLE 


FIGURE  2;  EXTERNAL/INTERNAL  TESTBED  COMPONENTS 

2.1  EXTERNAL/INTERNAL/ELINT  SIMUATION 

In  the  Testbed,  the  same  nine  scenarios  and  ground  truth  files  as  described  in 
Section  1,3.2  are  included.  Ground  truth  files  are  the  correct  external  and  internal 
values  identified  from  the  created  scenarios.  Platform  identification  and  ELINT  values 
were  added  to  these  ground  truth  files  in  the  third  EDM, 

The  ground  truth  files  are  an  integral  part  of  the  External/ELINT/Internal  simulation 
algorithm.  They  are  also  used  to  create  the  associations  table,  external  and  ELINT 
reloTions  tables  and  as  the  correct  answers  in  the  creation  of  statistics  in  the  audit  trail. 

The  experimenter  initially  selects  any  number  of  these  nine  scenarios.  For 
example,  in  the  experiment  in  Chapter  5  of  this  thesis,  three  groups  of  three  scenarios 
vxere  used. 

In  addition  to  scenario  selection,  the  experimenter  must  select  the  simulation 
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length.  If  zero  minutes  is  selected  for  the  beginning  time  and  five  minutes  is  selected  for 
the  end  time,  the  simulator  will  only  generate  the  first  five  minutes  of  the  selected 
scenarios.  Each  scenario  is  approximately  45  minutes  in  length. 

Another  parameter  of  the  simulation  to  be  selected  by  the  experimenter  is  the 
decision  report  time.  This  parameter  controls  the  maximum  length  of  time  before  the 
Testbed  makes  a  final  decision.  This  parameter  is  adjustable  from  one  to  five  seconds. 
These  times  are  based  on  the  average  transmission  length  in  the  tactical  speech 
processing  environment.  If  this  parameter  is  set  for  five  seconds  and  a  transmission  is  only 
two  seconds  long,  an  E/I  LSAP  output  report  is  created  without  waiting  for  five  seconds  to 
elapse. 

Another  variable  to  be  selected  in  the  simulation  is  one  of  the  four  flight  paths 
named  A,  B,  C,  and  D.  The  only  parameter  value  changed  as  a  result  of  this  selection  is 
the  direction  external. 

The  experimenter  can  select  any  combination  of  externals  and/or  ELINT  desired 
for  an  experiment.  The  five  externals  as  described  in  Chapter  1  are  location,  direction, 
frequency,  radio  type,  and  modulation  type.  In  addition,  different  presets  are  provided 
to  automatically  choose  popular  external/ELINT  combinations. 

A  confidence  score  can  be  assigned  for  each  external/ELINT  simulation.  This  score 
represents  the  experimenter  confidence  in  the  measurements  provided  by  the  external 
and  ELINT  simulation. 

2.1.1  INTERNAL  SIMULATION 

Separate  simulations  are  provided  for  the  speaker  identification,  language 
identification,  and  platform  identification  internals:  however,  the  functionality  of  each 
simulation  is  identical.  Each  internal  simulation  allows  for  the  selection  of  the  number  of 
possible  internals  (e.g.,  20  languages,  300  speakers,  20  platforms).  A  series  of  internals  and 
confidence  scores  is  created.  The  maximum  numbers  for  these  simulations  were  chosen 


to  be  operationally  significant. 
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A  second  parameter  allows  for  the  experimenter  to  choose  the  number  of 
confidence  scores  to  be  input  into  the  data  fusion  algorithm.  Thus,  if  the  300  speaker 
simulation  only  has  50  valid  scores,  there  is  no  reason  to  provide  all  300  for  fusion. 

The  internal  simulation  algorithm  can  produce  constant  ground  truth  and  a 
series  of  constant  secondary  scores.  For  instance,  if  there  are  four  languages  in  the 
simulation  and  a  score  of  80  is  assigned  to  the  ground  truth  language  (Lt),  and  a  50 
to  the  rest,  the  simulator  output  will  be  LI  80,  L2  50,  L3  50,  and  L4  50.  All  other 
language  score  sequences  will  be  produced  with  the  same  identical  pattern.  This 
simulation  is  called  a  constant  simulation. 

A  constant  simulation  allows  for  the  controlled  simulation  of  internals  since  the 
score  patterns  are  exactly  identical  and  the  ground  truth  internal  can  be  assigned  a 
value  higher  than  the  rest  creating  a  perfect  simulation.  Though  this  method  of  score 
generation  is  good  for  debugging  algorithms,  it  is  operationally  unrealistic.  Normally, 
the  internal  confidence  scores  are  nof  constant  and  in  the  tactical  speech  processing 
environment  the  internals  err. 

Thus,  the  internal  simulation  allows  for  the  selection  and  addition  of  distortion. 
Distortion  allows  for  a  random  number  fo  be  added  or  subtracted  to  each  of  the 
secondary  scores  without  any  alteration  of  fhe  ground  truth  score  assigned  to  the 
correct  internal.  For  example,  selecting  a  distortion  of  31  would  produce  fhe  above 
list  with  a  random  number  between  1  and  31  added  or  subtracted  from  the 
secondary  scores:  L2  81,  LI  80,  L3  40,  and  L4  30.  As  illustrated,  using  this  method  the 
internals  simulate  errors  in  their  list  since  L2  now  has  a  higher  value  than  ground  truth 
language  LI  . 

Another  way  of  simulafing  errors  in  the  internals  Is  by  selecting  a  random 
simulation.  In  this  type  of  simulafion,  fhe  secondary  scores  would  be  cresfed  by  a 
random  number  generafor.  Assuming  LI  is  the  ground  truth  language  with  the  score 
assigned  as  above,  a  report  such  as  this  could  be  obtained:  L2  95,  LI  80,  L3  3,  and 
L4  1.  Distortion  could  be  selected  by  the  experimenter  to  further  alter  these  scores. 
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No  matter  whether  constant  or  random  is  chosen  for  the  simulation,  the 
generation  of  internals  creates  choices  around  ground  truth.  If  L5  is  the  ground  truth 
language  and  three  languages  are  chosen,  the  simulation  produces  scores  for  L4,  L5,  and 
L6.  This  definition  assumes  that  the  languages  that  are  next  to  each  other  in  the 
database  are  typically  confused  with  each  other. 

The  final  important  characteristic  of  the  internal  simulation  is  the  ability  to 
reproduce  identical  internal  reports  from  experiment  to  experiment  by  the  selection  of 
reproducible  rather  than  dynamic  scoring.  If  an  experimenter  selects  dynamic  scoring,  a 
different  scoring  pattern  is  used  for  each  experiment.  The  selection  of  reproducible 
continuously  produces  the  some  score  pattern. 

2.2  EXTERNALS/ELINT  PROCESSOR 

As  depicted  in  Figure  2,  the  externals'  values  and  confidence  scores  are  input  into 
the  Externals  Processor.  Similarly,  ELINT  data  is  input  into  the  ELINT  Processor.  Another 
major  input  is  their  respective  relations  table  which  is  established  by  experienced  radio 
station  listeners  in  a  training  session. 

The  process  of  creating  ELINT  and  external  relations  tables  is  identical.  These 
relations  tables  are  established  as  a  matrix  between  all  possible  external  or  ELINT  values 
and  internal  values.  For  example,  if  the  experimenter  is  interested  in  how  frequency 
contributes  to  the  recognition  of  Rush  Limbaugh,  all  radio  stations  of  interest  across  the 
country  must  be  searched  and  the  matrix  filled  in  upon  detection  of  his  voice  on  a 
certain  radio  frequency. 

The  scenario  report  in  Table  1  illustrates  the  relationships  between  speaker 
identification,  frequency,  modulation  type,  and  location.  These  relationships  can  be 
recorded  over  multiple  days  and  used  as  the  basis  for  the  relations  table  data.  Similar 
relations  can  be  developed  for  the  other  internals,  externals,  and  ELINT. 

Following  the  accumulation  of  the  raw  data,  normalization  of  the  numbers  occurs 
to  produce  weights  between  one  and  four.  An  experienced  listener  familiar  with  the 
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association  between  Rush  Limbaugh  and  radio  frequency  can  look  at  these  final  weights 
and  adjust  them  based  on  this  knowledge.  This  adjustment  is  to  only  be  used  in 
extenuating  circumstances.  For  instance,  if  we  know  Rush  Limbaugh  is  always  on  radio 
frequency  1310  at  12:00  noon,  the  expert  can  provide  a  weight  of  100  in  the  relations 
table  for  this  interrelationship. 

The  Processor  uses  these  relations  table  weights,  preestablished  relative  weights, 
and  the  simulated  external  values  with  confidence  scores  to  mathematically  obtain  a 
total  score  for  each  internal  in  the  matrix.  All  scores  are  then  normalized  between  0  and 
100.  The  final  output  of  the  algorithm  is  an  external  based  ranked  list  of  Language, 
Speaker  and  Platform  Identification  (LSP)  choices  and  confidence  scores.  In  addition,  the 
Externals  Processor  makes  an  assessment  of  the  activity  (A)  from  the  external  data 
presented,  creating  external  based  LSAP  decisions  and  scores  when  linked  with  the  LSP. 

A  separate  ELINT  Processor  produces  an  independent  set  of  LSAP  decisions  based 
on  ELINT  data.  ELINT  reports  oocur  at  different  times  than  the  simulated  external  and 
internal  reports.  In  order  to  perform  the  LSAP  correlation  with  ELINT  at  the  same  time  as 
the  Externals  Processor,  an  initial  check  is  made  for  relevant  ELINT  data.  If  an  ELINT  report 
occurs  within  a  time  threshold  and  within  a  distance  threshold  of  the  simulated  external 
and  internal  report,  then  the  ELINT  data  is  relevant.  The  mathematics  and  normalization 
of  this  relevant  data  into  ELINT  based  LSAP  decisions  and  scores  is  identical  to  the 
Externals  Processor. 

2.3  EXTERNAL/INTERNAL  DATA  FUSION 

Among  the  inputs  to  the  E/I  data  fusion  algorithm  are  the  external  and  ELINT 
based  LSAP  decisions  and  scores  as  described  in  Section  2.2.  Other  inputs  are  the  internal 
LSP  score  sequences  provided  by  the  simulations  described  in  Section  2.1.  The  final  input 
into  the  data  fusion  algorithm  is  an  associations  table  developed  by  accumulation  of  the 
known  internal-internal  interrelationships  during  a  training  session  by  an  experienced  radio 


station  listener. 
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A  file  named  eidefaults  contoins  default  parameters  and  filenames  for  the  data 
fusion  algorithm.  These  parameters  are  listed  in  Appendix  A  with  the  system  default 
values  set  by  the  contractor  (Shirey  and  Morgan,  1993). 

Four  data  fusion  and  correlation  algorithms  were  developed.  Each  algorithm  can 
be  independently  tested  and  analyzed  by  the  experimenter  for  performance 
differences. 

Each  date  fusion  algorithm  performs  a  merge  of  the  three  input  lists  of  decisions 
and  scores.  Each  list  is  weighted  independently  from  0  to  100.  These  weights  are 
contained  in  the  eidefaults  file  and  can  be  altered  by  the  experimenter.  No  theoretical 
work  or  experiments  have  been  done  to  determine  optimum  values  for  these  weights. 
The  experiment  in  Chapter  5  of  this  thesis  analyzes  these  weights  in  detail. 

For  exomple,  to  merge  languoge  identification  (LID)  results,  W1  represents  the 
amount  of  weight  provided  to  the  results  of  the  internal  LID  simulation,  W2  represents  the 
amount  of  weight  provided  to  the  external  based  LID  results,  and  W3  represents  the 
amount  of  weight  provided  to  the  ELINT  based  LID  results.  If  A1  represents  the  score  for 
English  provided  by  the  LID  simuiotion,  and  B1  represents  the  score  for  English  provided 
by  the  external  based  LID  component,  and  Cl  represents  the  score  for  English  provided 
by  the  ELINT  based  LID  component,  the  score  provided  by  the  merge  is  calculated  by 
the  following  formula: 

Wl*(Ar)^-W2*{Bl)^W3*{Cl) 

W1+W2  +  W3 

This  merge  takes  place  for  each  language  in  each  of  the  three  lists.  In  addition, 
this  merge  occurs  in  the  same  manner  for  speaker  and  platform  identification.  For 
activity  identification,  since  there  are  only  external  and  ELINT  based  inputs,  W1  and  A1 
are  zero.  When  ELINT  information  is  not  used.  Cl  is  zero  and  W3  is  its  default  weight. 

This  merge  algorithm  is  one  of  a  larger  class  of  algorithms  which  could  have  taken 
the  three  lists  and  produced  one  output  sequence.  The  score  sequences  could  have 
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been  multiplied  together  in  probabilistic  Bayesian  sense,  combined  using  radio  station 
inference  rules,  or  processed  using  a  blackboard  expert  system  but  the  RL  contractor 
evaluated  different  schemes  and  chose  to  implement  the  merge  previously  described 
due  to  its  computational  simplicity. 

The  initial  data  fusion  olgorithm.  Correlator  1 ,  merges  the  input  list  of  decisions  and 
scores  by  equally  weighting  all  decisions  (Wl,  W2,  and  W3=100).  Following  this  initial 
ranking,  all  combinations  of  the  top  ''beto"  (set  in  the  eidefaults  file)  number  of  LSAP's 
are  mathematically  scored.  The  final  step  discards  any  resulting  LSAP  combinations  that 
do  not  oppear  in  the  associations  table. 

The  second  data  fusion  olgorithm.  Correlator  2,  added  a  capability  to  change 
the  weights  of  the  merge  to  any  number  between  0  and  100.  For  instance,  if  the 
externals  provide  a  poor  meosure  of  language  identification,  the  external  based  LID 
scores  could  be  given  a  small  weight  to  de-emphasize  them  in  the  merge  algorithm. 

Following  the  selection  of  the  top  "beta"  number  of  LSAP's  as  in  Correlator  1 ,  all 
resulting  LSAP  combinations  are  generated  and  scored.  The  algorithm  then  eliminates 
one  item  in  all  remaining  LSAP  combinatian  and  scores  it  with  this  item  removed.  For 
instance,  if  the  combination  (LI,  SI ,  Al,  PI)  was  generated,  the  algorithm  would  score 
(LI ,  SI ,  A1 ,  •),  (LI ,  SI ,  *,  Rl),  (LI ,  *,  Al,  PI),  and  (*,  SI ,  Al ,  PI).  A  later  version  of  the 
correlator  will  attempt  to  fill  in  the  to  recover  a  correct  hypothesis  not  generated  by 
the  algorithm.  For  example,  (*,  SI,  Al,  PI)  would  be  scored  as  (L5,  SI,  Al,  PI),  with  the 
replaced  by  the  correct  language  L5. 

In  addition.  Correlator  2  keeps  all  good  scoring  LSAP  combinations  even  if  they 
are  not  in  the  associations  table  but  discards  hypotheses  which  are  not  within  "cutoff" 

(set  in  the  eidefaults  file)  percent  of  the  top  scoring  hypothesis.  For  example,  if  the  top 
scoring  LSAP  had  a  score  of  60  and  the  "cutoff"  was  5%  then  any  final  hypothesis  with  a 
score  of  56  or  less  would  be  discarded. 

The  third  version.  Correlator  3  performs  the  merge  and  generates  hypotheses  os 
Correlator  2  but  it  assigns  the  missing  item  ("*"  above)  a  numerical  score,  in  addition. 
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the  algorithm  generates  combinations  with  two  missing  items  as  above  and  attempts  to 
find  good  scoring  hypotheses.  A  new  parameter,  "gamma",  (set  in  the  eidefaults  file)  is 
used  to  check  the  generated  hypotheses  against  the  associations  table  entries  and 
discard  any  invalid  combinations.  Following  this  step,  an  iterative  algorithm  attempts  to 
fill  in  all  LSAP  missing  items  as  described  above.  The  final  step  keeps  all  scoring 
hypotheses  within  "cutoff"  percent  of  the  top  scoring  hypothesis. 

Correlator  4  implements  the  same  algorithm  as  Correlator  3  but  generates 
hypotheses  in  a  different  manner.  Correlator  3  performs  the  merge,  then  hypothesis 
generation.  Correlator  4  performs  hypothesis  generation  on  the  outputs  of  the  Externals 
Processor,  ELINT  Processor  ond  internals  then  merges  the  generated  hypotheses.  If  the 
hypothesis  is  on  all  lists,  the  resulting  scores  ore  averaged.  If  a  hypothesis  from  one  list 
matches  a  hypothesis  on  the  other  lists  with  a  missing  item,  put  the  complete  hypothesis 
on  the  final  hypothesis  list  with  the  scores  averaged.  Finally,  if  the  hypothesis  was  only 
on  one  list,  the  scores  were  averaged  with  zeros.  Only  hypotheses  within  "cutoff 
percent  of  the  top  scoring  hypothesis  are  reported. 

In  all  four  Correlator  algorithms,  multiple  hypotheses  result  when  answers  are  within 
"cutoff"  percent  of  the  highest  scoring  hypothesis.  The  highest  scoring  hypothesis  is 
displayed  on  the  Testbed  output  screen  and  all  final  LSAP  hypotheses  are  recorded  in 
the  system  audit  file. 

2.3  SYSTEM  AUDIT  FILE 

For  every  experiment  performed  on  the  Testbed,  an  audit  file  is  created.  In  this 
audit  file,  all  simulation,  ELINT  Processor,  Externals  Processor  and  correlator  initialization 
parameters  are  recorded  olong  with  the  routing  table.  For  every  transmission,  all 
simulated  internal,  ELINT,  and  external  inputs  are  recorded,  as  well  as  the  outputs  of  the 
Externals  and  ELINT  Processors.  Also  included  are  the  Testbed  final  LSAP  hypotheses  and 
the  scenario  ground  truth.  All  outputs  which  do  not  agree  with  the  scenario  ground 
truth  are  denoted  so  the  reseorcher  can  try  to  determine  why  the  error  occurred. 
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The  end  of  the  system  audit  file  has  some  summary  statistics  for  the  experiment. 

All  internal  summary  stotistics  represent  the  percentage  correct  internal  identification 
provided  by  the  simulation.  The  ELINT  and  Externols  Processor  stotistics  indicate  the 
percentage  accuracy  of  the  internals  given  only  ELINT  and  external  data.  In  addition, 
the  Testbed  provides  merge  olgorithm  summary  statistics  describing  the  accuracy  of  the 
top  scoring  LSAP  combination  after  the  merging  of  all  inputs.  Other  correlotion  statistics 
offer  hypothesis  generotion  provide  the  accuracy  of  the  internals  contained  in  the  final 
best  scoring  hypothesis  and  statistics  on  the  accuracy  of  the  final  LSAP  hypotheses. 
Finally,  routing  stotistics  are  provided  to  show  how  accurate  and  efficient  the  routing 
decisions  are  made  by  the  E/I  Dota  Fusion  Testbed.  The  statistics  in  this  audit  file  are 
invaluable  to  the  experiments  described  in  Chapters  3,  4,  and  5  of  this  thesis. 

2.4  MISCELLANEOUS 

The  Testbed  allows  for  the  choice  of  speed.  In  a  demonstration,  normal  speed 
will  play  the  scenarios  out  at  the  realistic  scenario  time.  For  experimentation,  speeds  of 
2X,  4X,  and  maximum  (Max)  are  provided.  The  2X  and  4X  cases  are  clocked  to  provide 
that  much  speed  up  to  the  scenarios.  In  the  Max  case,  the  simulated  input  records  are 
provided  to  the  correlator  as  fast  as  possible,  resulting  in  three  different  scenarios  (each 
composed  of  45  minutes  of  data)  being  processed  in  approximately  7  minutes. 
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Chapter  3 

INTERIM  EXPERIMENT 

This  chapter  describes  the  design,  performance  and  conclusions  drawn  from  a 
quick  experiment  performed  with  the  Testbed.  This  test  was  designed  and  performed  to 
demonstrote  that  the  Testbed  could  be  used  in  experimental  analysis.  The  experimental 
conclusions  described  are  not  proven,  just  measured  as  trends  since  the  amount  of 
data  processed  in  this  quick  test  is  not  significant.  Lessons  learned  from  this  quick 
experiment  ore  followed  in  the  thesis  experiment  described  in  Chapter  5. 

3.1  INTERIM  EXPERIMENT  OBJECTIVE 

This  short  experiment  wos  designed  to  determine  the  effect  of  ail  possible  external 
combinations  on  internals  using  data  from  three  different  scenario  groups.  In  this 
experiment,  the  number  of  externals  were  varied  in  combinations  of  5,  4,  3,2,  1 ,  and  0. 
The  results  were  expected  to  vary  as  the  scenarios  and  external  combinations  varied, 
but  it  was  hoped  to  find  trends  which  would  indicate  the  value  and  reliability  of  the 
externals  ond  their  thirty-two  different  combinations. 

3.2  INTERIM  EXPERIMENT  DESIGN 

During  experimental  design,  each  of  the  internal  and  external  simulations  were 
required  to  be  set  up  to  allow  odditional  familiarity  with  the  various  simulator  internal 
and  external  input  options.  In  addition,  before  running  this  experiment  additional 
familiarity  was  gained  on  what  the  different  audit  file  summary  statistics  meant. 

All  audit  file  summary  statistics  were  examined  for  applicability  to  this  experiment. 
All  routing  statistics  are  dependent  on  the  routing  table.  Since  this  experiment  was 
meant  to  be  quick,  there  was  no  time  spent  developing  a  realistic  routing  table,  thus 


22 


these  statistics  were  not  used.  For  the  experiment  in  Chapter  5  of  this  thesis,  a  realistic 
routing  table  is  created. 

The  merge  algorithm  summary  statistics  were  not  examined  for  applicability. 

In  the  design  of  the  experiment  in  Chapter  5,  these  statistics  are  considered. 

In  this  quick  experiment,  the  ELINT  and  External  Processor  statistics  illustrating 
the  accuracy  of  the  Externals  Processor  and  ELINT  Processor  were  not  as  interesting  as 
the  difference  between  the  simulation  LSP  results  provided  in  the  internal  summary 
statistics  and  the  final  LSAP  results  in  the  correlation  statistics.  The  difference  between 
these  results  measures  the  amount  of  improvement  gained  by  using  the  correlation 
algorithm.  This  difference  was  used  extensively  in  this  quick  study  and  will  be  used  in 
the  experiment  in  Chapter  5  as  well. 

It  was  also  decided  to  calculate  the  average  improvement  over  three 
different  scenario  groups  for  comparison  purposes.  In  the  long  run,  these  statistics  are 
more  reliable  than  basing  them  on  only  one  single  group.  Statistical  means  and 
multiple  iterations  are  also  used  in  the  experiment  in  Chapter  5. 

3.2.1  INSURING  TESTBED  CAPABILITIES 

During  experimental  design,  it  was  deemed  essential  to  perform  a  quick  test 
to  insure  that  several  of  the  system  capabilities  were  properly  functioning.  These  tests 
were  not  extensibly  performed. 

By  making  multiple  runs  using  the  same  parameters,  reproducible  scoring  for 
the  simulations  was  proven  to  provide  the  same  data  repeatedly  by  looking  at 
multiple  tests  with  identical  summary  statistics.  In  addition,  when  exiting  and 
reentering  the  Testbed,  these  statistics  still  were  the  same.  This  reproducibility  was 
also  tested  by  saving  the  experiment  and  looking  at  the  summary  results  after  later 
recall.  Though  reproducible  worked  for  fhis  experiment.  Chapter  5  will  illustrate  that 
reproducible  tests  can  run  only  in  this  one  specific  case. 

Constant  ground  truth  and  secondary  scoring  was  verified  by  setting  the 
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language  and  platform  identification  simulations  to  constant  scoring  and  observing 
the  nonvarying  scores  in  the  audit  file  report. 

Final  audit  file  statistics  were  proven  correct  in  a  ten  transmission  run.  Each 
statistical  category  was  analyzed  by  hand  and  compared  with  the  audit  file  results. 

3.2.2  INTERIM  EXPERIMENT  SETUP 

The  test  was  designed  to  create  a  speaker  identification  accuracy  between 
50  and  75%  and  measure  the  improvement  due  to  the  correlation  algorithm.  A 
reproducible  simulation  of  twenty  speakers  was  chosen,  with  the  ground  truth  score 
of  90.  Random  scoring  was  selected  for  the  other  nineteen  speakers  with  no 
distortion  added.  Only  five  scores  were  reported. 

The  language  and  platform  identification  simulations  were  chosen  as  constant 
ten  item  simulations  with  no  distortion  [100%  accurate)  to  minimize  the  number  of 
varying  conditions.  The  externals  were  chosen  for  each  experiment  in  accordance 
with  the  objective  and  simulated  with  100%  confidence.  No  parameters  in  the 
eidefaults  file  were  altered. 

Three  groups  of  scenarios  were  run  through  all  combinations  of  externals; 
(7,8,9),  (4,5,6),  and  (1,2,3).  The  experiments  were  performed  at  2X  speed  and 
processed  15  minutes  of  scenario  data.  The  decision  time  was  set  at  3  seconds.  The 
ELINT  simulation  was  not  used. 

3.3  INTERIM  EXPERIMENT  RESULTS  AND  ANALYSIS 

Measurements  were  made  of  the  speaker  identification  and  LSAP  accuracy 
before  and  after  correlation.  A  sample  of  the  performance  results  for  three  different 
experimental  conditions  is  included  in  Appendix  B.  The  detailed  performance  statistics 
are  included  in  a  report  by  Perz  and  Parker,  (1993).  All  analysis  and  conclusions  are 
derived  exclusively  from  the  statistics  in  that  report. 

Appendix  B  illustrates  the  statistics  gathered  and  analyzed.  The  table  shows 
the  speaker  and  LSP  simulator  statistics  and  the  subtraction  of  these  values  from  their 
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respective  correlator  statistics  resulting  in  percentage  improvements.  For  example,  the 
LSP  correlated  score  for  5  externals  and  the  scenario  combination  (1,2,3)  equals  83.27%. 
This  value  minus  the  LSP  internals  correct  score  (64.94%)  results  in  an  18.33%  LSP 
improvement.  Results  are  also  provided  for  the  (4,5,6)  and  (7,8,9)  scenario  groups. 
Provided  for  each  different  set  of  externals  are  the  mean  and  standard  deviation  values 
over  the  three  scenario  groups.  Other  statistics  are  included  in  the  Appendix  that  were 
not  analyzed,  namely,  the  average  number  of  hypotheses  and  routing  statistics. 

Analysis  for  the  best  external  combination  began  by  finding  the  best  individual 
percentoge  improvement  scores  for  each  of  the  three  groups  of  scenarios.  Many 
inconsistent  external  combinations  were  eliminated  and  Table  2  was  derived. 

Inconsistent  external  combinations  are  those  values  which  scored  well  for  one  scenario 
group,  but  scored  poorly  In  another.  For  example,  the  combination  modulation  type, 
radio  type,  and  location  showed  improvement  scores  greater  that  20%  for  scenario 
group  (1,2,3),  but  had  negative  improvements  for  the  scenario  group  (4,5,6).  Seven 
combinations  stood  out  from  the  rest  in  that  they  had  better  speaker  and  LSP 
improvement  scores  for  all  three  scenario  groups.  Mean  and  standard  deviation  scores 
were  added  to  these  tables  to  better  compare  them. 

Both  parts  of  Table  2  show  that  the  external  combination  of  frequency,  radio 
type,  and  location  has  the  largest  mean  improvement  percentage  with  ver\'  close  to 
the  best  stondard  deviotion.  The  value  in  the  (4,5,6)  column  in  both  tables  is  close  to 
the  highest  while  the  (1,2,3)  and  (7,8,9)  values  exceed  all  other  values  in  their  respective 
columns. 

All  combinations  in  Table  2  include  the  frequency  external  (FREQ).  The  detailed 
results  olso  show  that  frequency  is  the  only  external  which,  when  used  exclusively,  results 
in  performance  improvements.  Further  analysis  of  the  scenario  ground  truth  files 
discovered  that  no  scenario  includes  a  change  of  frequency  which  possibly  explains  this 
phenomenon. 
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EXTERNAL  COMBINATIONS  WITH  BEST  PERCENT  SPEAKER  IMPROVEMENTS 

COMBINATION 

(1,2.3) 

(4.5,6) 

(7,8,9) 

MEAN 

STANDARD 

DEVIATION 

FREQ/RT/LOC 

23.51 

26.67 

16.66 

22.28 

5.12 

FREQ/MOD/LOC 

21.91 

26.67 

16.66 

21.75 

5.01 

FREQ/MOD/RT/LOC 

21.12 

27.22 

16.05 

21.46 

5.59 

FREQ/RT 

21.91 

27.78 

14.19 

21.29 

6.82 

FREQ/MOD 

21.51 

25.56 

14.81 

20.63 

5.43 

FREQ/MOD/RT 

21.91 

26.11 

12.34 

20.12 

7.06 

FREQ/DIR/RT/LOC 

21.51 

22.78 

13.58 

19.29 

4.99 

EXTERNAL  COMBINATIONS  WITH  BEST  PERCENT  LSP  IMPROVEMENTS 

COMBINATION 

(1,2.3) 

(4.5.6) 

(7,8,9) 

MEAN 

STANDARD 

DEVIATION 

FREQ/RT/LOC 

22.71 

26.67 

16.66 

22.01 

5.07 

FREQ/MOD/LOC 

20.72 

26.67 

16.66 

21,35 

5.03 

FREQ/MOD/RT/LOC 

20.32 

27.22 

16,05 

21.20 

5.64 

FREQ/RT 

20.72 

27.78 

14.19 

20.90 

6.80 

FREQ/MOD 

20.72 

25.56 

14.81 

20.36 

5.26 

FREQ/MOD/RT 

20.72 

26.11 

12.34 

19.72 

6.94 

FREQ/DIR/RT/LOC 

20.32 

22.78 

13.58 

18.89 

4.76 

TABLE  2:  PRELIMINARY  EXPERIMENT  RESULTS 


For  this  database,  the  lack  of  frequency  change  was  designed  to  be  as  realistic 
as  possible  for  the  scenarios  generated.  However,  typically,  one  radio  station  listener 
monitors  multiple  frequencies  at  the  same  time. 

By  observing  these  statistics,  it  was  noticed  that  the  addition  of  directional 
information  seemed  to  hinder  performance.  Analysis  showed: 

1 .  With  any  one  other  external,  many  negative  improvement  scores  resulted. 
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2.  In  the  test  with  two  other  externals,  only  the  one  with  frequency  and 
radio  type  failed  to  result  in  negative  improvement  scores. 

3.  Using  all  externals  except  for  direction  resulted  in  the  best  improvement 
scores  for  a  four  external  combination. 

4.  Direction  plays  a  role  in  all  negative  improvements  at  one  time  or  another, 
except  for  the  MOD/RT/LOC  and  MOD/RT  combinations. 

Note  that  both  exceptions  included  in  item  4  in  the  previous  paragraph 
include  the  modulation  and  radio  type  externals.  These  two  externals  seem  to  work 
rather  weakly  together.  FREQ/MOD/RT  and  four  other  external  combinations 
including  both  modulation  and  radio  types  avoid  negative  improvements  because  of 
the  presence  of  the  frequency  external  or  the  other  external  information  to 
compensate  for  deficiencies  in  the  MOD/RT  interactions.  Omission  of  modulation  type 
for  the  four  external  tests  results  in  improvement  scores  second  only  to  the  omission  of 
the  direction  external.  Using  only  modulation  type  resulted  in  the  highest  negative 
improvement  scores  for  a  single  external. 

In  order  to  analyze  the  general  theory  of  data  fusion,  means  and  standard 
deviations  for  all  numbers  of  externals  were  calculated  as  shown  in  Table  3.  Note 
that  the  mean  speaker  improvement  scores  are  slightly  higher  than  the  mean  LSP 
improvement  scores.  In  addition,  the  mean  percentage  improvement  increases  as 
the  number  of  externals  are  increased. 


SPEAKER  %  IMPROVEMENT 

LSP  %  IMPROVEMENT 

NUMBER  OF 

MEAN 

STANDARD 

MEAN 

STANDARD 

5 

17.96 

5.51 

17.42 

5.31 

4 

17.35 

6.17 

16.00 

5.26 

3 

13.14 

8.92 

9.19 

1  1.56 

2 

12.01 

6.34 

6.08 

1 1.28 

1 

-6.32 

8.98 

-12.75 

1 1.04 

0 

-33.45 

16.53 

-44.80 

9.45 

TABLE  3:  PRELIMINARY  EXPERIMENT  DATA  FUSION  RESULTS 
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3.4  INTERIM  EXPERIMENT  CONCLUSIONS 

Given  the  poor  performance  of  the  direction  external  with  other  external 
combinations,  it  should  not  be  used  in  those  combinations.  Even  though  the  addition  of 
directional  data  should,  according  to  data  fusion  theory,  provide  for  better 
improvement  scores,  these  preliminary  experiments  question  its  use.  This  consideration 
will  be  taken  into  account  in  the  experiment  in  Chapter  5. 

The  database  and  results  illustrate  performance  of  the  Testbed  against  scenarios 
on  fixed  frequencies.  As  of  yet,  there  have  been  no  experiments  performed  to  test  how 
the  Testbed  correlation  algorithms  react  to  changes  in  frequencies.  This  would  involve 
modifying  the  database  as  discussed  in  Section  4.1.8. 

Based  on  the  date  in  Table  2,  the  external  combination  FREQ/RT/LOC  is  the  best 
combination.  Meon  speaker  and  LSP  improvement  percentages  are  better  than  those 
of  all  others.  This  external  combination  is  used  in  the  experiment  in  Chapter  5. 

The  modulation  ond  radio  type  externals  can  be  useful  if  used  in  conjunction  wrth 
other  externals  as  well,  but  certainly  should  not  be  used  alone  or  together  by 
themselves.  This  recommendation  is  taken  into  account  in  the  experiment  in  Chapter  5. 

Based  on  the  results  in  Table  3,  data  fusion  theory  does  hold.  As  the  number  of 
externals  increased,  the  better  the  results  got.  The  difference  between  four  and  five 
externals  was  not  very  significant  as  compared  with  the  others  possibly  due  to  the 
clashes  between  modulation/radio  type  and  location/direction. 

More  tests  should  be  run  for  statistically  significant  conclusions.  This  experiment 
only  considered  three  scenario  groups  covering  all  of  the  possible  external 
combinations,  therefore  the  results  obtained  cannot  compare  with  the  reliability  of 
many  tests.  In  addition,  only  15  minutes  of  data  was  run  as  opposed  to  the  full  45 
minute  scenarios.  The  thesis  experiment  in  Chapter  5  corrects  for  these  deficiencies. 

Lastly,  this  experiment  did  not  consider  the  routing  statistics  or  the  merge 
algorithm's  summary  statistics.  The  experiment  in  Chapter  5  defines  a  pertinent  routing 
table  ond  analyzes  the  routing  statistics  as  well  as  the  results  of  the  merge  algorithm. 
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Chapter  4 

POSSIBLE  EXPERIMENTS 

This  chapter  describes  multiple  experiments  that  could  be  performed  with  the 
Testbed,  These  experiments  were  created  exclusively  for  this  thesis.  Some  ideas  were 
taken  from  Shirey  and  Morgan  (1993)  and  Parker  (1993), 

There  is  no  long  term  goal  for  all  of  these  experiments.  Some  experiments  were 
defined  to  evaluate  different  parameter  settings  of  the  Testbed.  Others  were  to  clarify 
or  quantify  unsubstantiated  claims  made  by  the  contractor.  Some  experiments  were 
designed  to  provide  some  feedback  on  externals  to  the  operational  community. 

Finally,  some  were  described  to  provide  some  feedback  to  the  speech  processing 
scientific  community, 

Each  experimental  description  includes  an  initial  statement  describing  the 
objective,  a  general  experimental  procedure  including  some  of  the  variables  chosen  to 
be  constant  and  some  of  the  variables  to  be  varied,  and  some  statement  of  the 
expected  final  result.  There  was  no  attempt  to  define  the  specific  experimental  design 
and  performance  measurements  for  each  variable  (i.e.,  appropriate  values  for  "cutoff", 
"beta",  and  "gammo"  were  not  defined).  Finally,  at  the  end  of  this  Chapter  a 
justification  is  given  for  which  experiment  is  to  be  defined  in  detail  and  performed  in 
Chapter  5. 

4.1  EXPERIMENTS 

Ten  different  experiments  were  defined  in  the  sections  which  follow.  All 
experiments  will  include  several  three-scenario  groups,  include  the  full  45  minutes  of 
ground  truth  data,  and  be  performed  at  Max  speed.  The  45  minute  simulation  length  is 
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the  maximum  run  length  for  the  scenarios  and  was  chosen  to  give  enough  data  to 
provide  the  maximum  amount  of  scenario  variability.  Max  speed  allows  for  this  amount 
of  data  to  be  run  in  about  7  minutes,  allowing  multiple  iterations  to  be  processed 
quickly. 

The  external  set,  when  not  explicitly  varied  in  the  description,  agrees  with  the  best 
external  set  chosen  during  the  quick  experiment  in  Chapter  3  (frequency,  radio  type, 
location).  No  additional  thought  was  given  as  to  whether  another  choice  was  better. 

In  the  experiments  where  the  correlation  algorithm  number  is  not  explicitly  varied. 
Correlator  3  was  chosen.  This  correlation  algorithm  was  used  the  most  during  RL  s 
development  controct  with  HRB  Systems. 

When  not  explicitly  varied  in  the  experiments,  realistic  values  will  be  selected  and 
kept  constant  throughout  all  iterations  of  the  experiment  for  the  following  variables, 
correlation  merge  weights;  correlation  parameters  cutoff ,  beta  ,  and  gamma  , 
Externals  Processor  and  ELINT  Processor  relative  weights;  frequency  and  location  error 
tolerances;  and  ELINT  time  and  distance  thresholds.  These  parameter  values  will  be 
aperationally  realistic  where  possible. 

For  all  experiments,  the  decision  report  time  will  be  five  seconds,  as  this  parameter 
never  changes  the  results  provided.  In  addition,  unless  explicitly  changed  in  the 
experiments,  training  for  the  relations  tables  and  the  association  table  will  be  based  on 
100%  of  the  database.  Finally,  the  flight  path  value  will  be  set  and  not  altered  unless 
explicitly  changed  in  the  experiment  since  the  direction  external  was  not  o  part  of  the 
selected  external  set. 

In  all  cases,  an  appropriate  routing  table  will  be  defined  and  used.  This  routing 
table  will  be  created  in  an  operationally  realistic  manner. 

The  following  ten  sections  describe  the  different  experiments: 

4.1.1  CORRELATOR  3  PERFORMANCE 

The  objective  of  this  experiment  is  to  perform  detailed  test  and  evaluation  of 
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Correlator  3.  There  has  been  no  long  term  evaluation  of  Correlator  3  in  terms  of 
performance. 

Generol  Experimental  Methodology:  Run  several  different  internal  simulations  with 
various  numbers  of  speakers,  languages,  and  platforms.  Run  some  experiments 
with  ELINT,  Evaluate  performance  of  Correlator  3  algorithm. 

Final  Result:  Correlator  3  algorithm  improves/degrades  speaker,  language, 
activity  and  platform  identification  accuracy  by  X%. 

4.1.2  CORRELATOR  3  VERSUS  CORRELATOR  4 

The  objective  of  this  experiment  is  to  compare  Correlator  3  with  Correlator  4.  This 
would  show  the  differences  between  HRB's  correlation  algorithm  (3)  and  HRB's  version 
of  RL's  correlotion  algorithm  (4). 

Generol  Experimental  Methodoloav:  Run  several  different  internal  simulations  with 
vorious  numbers  of  speakers,  languages,  and  platforms.  Run  some  experiments 
v/ith  ELINT.  Run  all  simulations  through  both  Correlator  3  and  Correlator  4  and 
compare  the  results. 

Finol  Result:  Correlator  3  or  4  Is  the  better  correlation  algorithm. 

4.1.3  BEST  EXTERNAL  SET 

The  objective  of  this  experiment  is  to  determine  what  set  of  externals  gives  the 
most  correlation  improvement  to  speaker  identification.  This  is  similar  to  the  experiment 
in  Chapter  3  except  the  ELINT  parameter  would  be  added  and  other  adjustments 
made  to  make  the  experiment  more  realistic.  For  all  iterations  of  this  experiment, 
platform  and  language  identification  would  be  kept  constant. 

Generol  Experimentol  Methodoloav:  Run  all  combinations  of  1-5  externals  and 
different  flight  paths.  Run  several  different  speaker  identification  simulations  with 
different  numbers  of  speakers.  Ideally,  run  the  same  simulations  through  all 
combinations  of  externals  and  compare  the  results. 

Final  Result:  This  combination  of  externals  provides  the  best  E/i  correlation 


performance. 

4.1.4  OPTIMAL  PARAMETER  SET 
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The  objective  of  this  experiment  is  to  determine  what  parameters  provide  the  most 
correlation  improvement. 

General  Experimental  Methodology:  Run  several  different  internal  combinations  with  a 
set  number  of  speakers,  languages  and  platforms  through  different  error  tolerance  tor 
frequency  and  location;  different  external,  internal  and  ELINT  merge  weights;  different 
values  tor  the  correlation  parameters  "cutotf',  "beta",  and  "gamma";  different  Externals 
and  ELINT  Processor  relative  weights;  and  different  ELINT  relevant  time  and  distance 
thresholds.  Run  some  experiments  with  ELINT.  Compare  the  results  over  multiple 
iterations. 

Final  Result;  These  parameter  values  provide  the  best  Correlator  3  performance. 

4.1.5  LIMITED  TRAINING  DATA 

The  objective  of  this  experiment  is  to  determine  the  effect  of  limited  training  for  the  ELINT 
and  Externals  Processor  relations  tables  and  associations  table  on  Correlator  3  performance. 
Some  limited  training  data  tables  have  already  been  created  for  this  test. 

General  Experimental  Methodology:  Run  several  different  internal  combinations 
with  a  set  number  of  speakers,  languages  and  platforms.  Run  some  experiments 
with  ELINT.  Ideally,  run  the  same  simulations  but  vary  the  amount  of  data  the  ELINT 
and  Externals  Processor  relations  tables  and  associations  table  are  trained  on.  Evaluate 
differences  in  performance. 

Final  Result:  This  correlation  algorithm's  performance  improves/degrades  by  X% 
when  training  the  tables  with  Y%  of  the  database. 

4.1.6  FOUR  CORRELATION  ALGORITHM  COMPARISON 

The  objective  of  this  experiment  is  to  compare  the  performance  of  Correlator  1  vs. 
Correlator  2  vs.  Correlator  3  vs.  Correlator  4.  This  would  show  the  differences  between  all  four 
correlation  algorithms  and  conditions  for  which  the  algorithms 
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performed  excellently. 

Generol  Experimental  Methodology:  Run  several  different  internal  simulations  with 
various  numbers  of  speakers,  languages,  and  platforms.  Run  some  experiments 
with  ELINT.  Ideally,  run  the  same  data  through  all  four  algorithms  and  compare 
the  results, 

Final  Result:  This  correlation  algorithm  performs  well  in  these  circumstances. 

4.1.7  LARGE  POPULATION  SPEAKER  IDENTIFICATION 

The  objective  of  this  experiment  is  to  determine  the  improvement  in  large 
population  speaker  identification  (300  speakers)  using  E/I  correlation.  This  would  also 
involve  developing  multiple  ground  truth  files  with  300  speakers  and  appropriate 
associotion  and  relations  tobies.  In  addition,  this  would  Involve  modifying  the  software 
to  vary  the  scores  of  the  300  speaker  id's  like  the  current  simulation  software. 

General  Experimental  Methodology:  Run  multiple  simulations  with  300  speakers 
and  constont  language  and  platform  simulations.  Run  some  experiments  with 
ELINT.  Compare  the  performance  before  and  after  correlation. 

Finol  Result:  E/I  correlation  improves/degrades  large  population  speaker 
identification  accuracy  by  X%. 

4.1.8  NEW  EXTERNAL/INTERNAL  DATABASE 

The  objective  of  this  experiment  is  to  determine  performance  of  Correlator  3  on 
another  E/I  database.  The  Testbed  is  currently  built  around  a  specific  database  with 
specific  ground  truth  files.  This  will  convert  and  test  E/I  against  this  new  database.  This 
will  require  the  development  of  the  following  new  items:  scenarios,  ground  truth  files, 
associations  table  and  Externals  and  ELINT  Processor  relations  tables. 

General  Experimental  Methodology:  For  the  nev/  database,  run  several  different 
internal  simulations  with  various  numbers  of  speakers,  languages,  and  platforms. 
Run  some  experiments  with  ELINT,  Evaluate  the  performance  of  Correlator  3, 
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Final  Result:  The  E/I  correlation  and  merge  algorithms  perform  this  well  on  this  new 
database.  This  demonstrates  the  applicability  of  this  technology  to  other 
databases.  In  addition,  an  actual  operational  database  could  be  developed, 
rather  than  the  present  semi-realistic  simulation. 

4.1.9  VALUE  OF  DIRECTION  EXTERNAL 

The  objective  of  this  experiment  is  to  determine  the  value  of  the  direction 
external.  This  would  show  the  operational  value  of  this  parameter,  since  the  current 
field  system  does  not  provide  it  to  a  radio  station  listener. 

Generol  Experimental  Methodology:  Select  external  set  (except  location,  see 
Chopter  3).  Run  several  different  internal  simulations  with  varying  numbers  of 
speakers,  languages,  and  platforms.  Run  some  experiments  with  ELINT.  Evaluate 
performance  for  each  iteration  with  and  without  directional  data. 

Final  Result:  The  value  to  the  E/I  correlation  algorithm  is  X%  when  directional 
information  is  provided  along  with  the  other  externals. 

4.1.10  ELINT/EXTERNAL  DATA  FUSION 

The  objective  of  this  experiment  is  to  determine  the  value  of  fusing  externals  with 
ELINT.  The  unique  ELINT,  external,  and  internal  simulation  capability  provided  by  this 
Testbed  allows  for  this  comparison  to  be  performed. 

Generol  Experimental  Methodology:  Run  several  different  internal  simulations  with 
a  varying  number  of  speakers,  languages,  and  platforms.  Evaluate  performance 
for  each  iteration  with  and  without  ELINT  information. 

Final  Result:  The  amount  of  improvement/degradation  due  to  the  merging  and 
correlation  of  ELINT  data  with  externals  and  internals  is  X%. 

4.2  EVALUATION 

The  above  ten  experiments  were  analyzed  to  determine  which  experiment  to  run 
in  detail.  The  factors  examined  were  operational  relevance,  interests  to  the  speech 
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community,  eose  of  use  (which  included  setup  time),  and  the  experiment's  interest  to 
me.  The  results  are  summarized  in  Table  4. 


EXPERIMENT  NAME 

COMPARATIVE 

OPERATIONAL 

RELEVAHCE 

COMPARATIVE 

INTEREST 

TO  SPEECH 
COMMUNITY 

COMPARATIVE 
EASE  OF  USE 

COMPARATIVE 

INTEREST 

TO  ME 

1.  CORRELATOR  3  PERFORMANCE 

MEDIUM 

MEDIUM 

HIGH 

MEDIUM 

2.  CORRELATOR  3  VS.  4 

NOT  EVALUATED 

3.  BEST  EXTERNAL  SET 

HIGH 

LOW 

HIGH 

MEDIUM 

4.  OPTIMAL  PARAMETER  SET 

MEDIUM 

MEDIUM 

HIGH 

HIGH 

5.  LIMITED  TRAINING  DATA 

MEDIUM 

VERY  LOW 

LOW 

6.  CORRELATOR  1  VS.  2  VS.  3  VS.  4 

NOT  EVALUATED 

7.  LARGE  POPULATION  SPEAKER  ID 

HIGH 

LOW 

LOW 

8.  NEW  E/I  DATABASE 

MEDIUM 

MEDIUM 

LOW 

LOW 

9.  VALUE  OF  DIRECTION  EXTERNAL 

HIGH 

LOW 

VERY  LOW 

LOW 

10.  ELINT/EXTERNAL  DATA  FUSION 

HIGH 

LOW 

TABLE  4;  EXPERIMENT  TRADE-OFF  ANALYSIS 


Evaluating  correlation  algorithm  1, 2,  3,  and  4  (Test  6)  and  comparing  Correlator  3 
versus  4  CTest  2)  were  not  evaluated.  It  was  preferred  to  perform  more  iterations 
evaluating  the  performance  of  Correlator  3  (Test  1),  instead  of  the  larger  number  of 
iterations  and  time  to  effectively  prove  Test  2  and  Test  6. 

The  experiments  with  the  most  operational  relevance  were  the  direction  (Test  9), 
ELINT/Externals  data  fusion  (Test  10),  best  external  set  (Test  3),  and  limited  training  data 
(Test  5)  experiments.  Conclusions  reached  os  a  result  of  these  tests  would  influence 
operational  system  design. 

Tests  of  data  fusion  with  large  population  speaker  identificotion  (Test  7)  is  of  high 
interest  to  the  speech  processing  community.  The  technical  difficulty  of  separating  300 
speakers  purely  on  their  speech  parameters  has  been  recognized  as  most  researchers 
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currently  only  work  on  groups  of  50  or  less.  It  is  theorized  that  somehow  breaking  the 
300  into  groups  of  50  using  externals  may  be  one  way  of  handling  the  recognition  of  this 
large  number  of  speakers. 

All  the  operotionol  tests  (Tests  3,  9,  and  10)  are  of  low  interest  to  speech 
researchers  except  for  the  limited  training  experiment.  Since  the  nature  of  the 
operational  environment  is,  in  general,  limited  in  the  amount  of  audio  data  for  training 
speech  prccessing  systems,  researchers  would  be  moderately  interested. 

Any  item  was  rated  very  law  on  the  ease  of  use  parameter  when  is  was  required 
to  run  identical  data  through  several  different  test  iterations  to  prove  the  conclusion.  In 
Section  5.3. 1.2,  the  impossibility  of  the  Testbed  to  produce  more  than  one  set  of 
reproducible  numbers  is  described. 

Any  experiment  which  required  modificdtion  or  creating  a  new  database  (Test 
7,  8)  was  rated  low  on  the  eose  of  use  criteria.  The  creation  and  addition  of  data  files 
would  require  on  enormous  amount  of  setup  time  to  create  the  required  files. 
Experiments  rated  highly  on  the  ease  of  use  parameter  are  given  that  rating  since  the 
performance  of  these  experiments  is  trivial  once  the  experiment  is  designed  and  the 
variables  are  set. 

Finally,  interest  for  me  was  rated  as  high  for  the  optimization  experiment  (Test  4). 
The  Testbed  was  delivered  by  the  contractor  with  default  numerical  settings  for  all 
parameters.  Though  some  mental  thought  went  into  the  numerical  settings,  no 
optimization  was  made.  Running  any  of  the  other  experiments  with  random  settings  for 
these  parameters  would  give  questionable  significance  to  any  results. 

Experiment  4  is  the  most  desirable  of  the  options  since  it  scored  the  best  on  the 
total  evaluation  criteria.  In  the  experimental  description  in  Section  4,1.4,  it  cited 
performing  experiments  to  optimize  many  different  parameters.  It  was  later  determined 
that  the  most  important  parameters  to  run  experiments  with  are  the  merge  weights. 
These  weights  have  a  direct  bearing  on  the  output  scores  of  the  merge  algorithm  and 
an  indirect  bearing  on  the  final  hypotheses  (since  the  hypothesis  generation  algorithm 
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uses  the  scores  resulting  from  the  merge  algorithm). 

"Cutoff",  "beta”,  ond  "gamma"  are  important  parameters  only  to  the  hypothesis 
generation  stage  of  the  correlation  algorithm.  The  remaining  parameters,  other  than 
the  merge  weights,  only  have  an  influence  on  the  Externals  and  ELINT  Processor  inputs 
to  the  merge  algorithm.  Though  these  parameters  are  important,  they  are  not  as 
important  to  optimize  os  the  merge  weights. 


Chapter  5 


THESIS  EXPERIMENT 

This  chapter  describes  in  detail  the  thesis  experiment.  The  chapter  contains  an 
introduction,  a  prediction  for  the  resulting  values  of  the  weights,  as  weli  as  the 
experimentai  design,  anaiysis,  and  conclusions. 

In  severol  ports  of  this  chapter,  operational  values  and  procedures  are  cited.  This 
data  was  obtained  from  verbal  conversations  with  experienced  radio  station  iisteners 
Norm  Lambert  of  Mei  Technology  Corporation  and  Dave  Morgan,  HRB  Systems  Inc,  The 
useful  operational  knowledge  of  Jim  Cupples,  Rome  Laboratory  was  also  verbally 
provided. 

5.1  INTRODUCTION 

The  objective  of  this  experiment  is  to  perform  tests  to  optimize  the  merge  weights 
for  ELINT  based  internals,  speech  based  internals,  and  external  based  internals.  Since 
there  is  no  other  existing  capability  to  perform  this  integration  of  information  from  these 
three  processes,  this  experiment  will  be  the  initial  investigation  of  the  relationships 
between  ELINT,  externals,  and  internals. 

One  could  ask  that  if  one  of  the  processes  is  consistently  better  than  the  others 
then  why  run  oil  of  them?  RL  has  not  done  any  Research  and  Development  to  prove 
vyhether  interna!  based  speaker  identification  (ID)  is  any  better  than  external  based 
speokerlD  or  ELINT  based  speaker  ID.  This  holds  true  for  the  other  internals  as  well.  This 
is  part  of  the  reason  for  the  development  of  the  E/I  Testbed. 

Internals  are  only  as  good  as  the  training.  In  the  operational  environment,  a 
multitude  of  problems  (noise,  radio  station  mistuning,  speaker  variability  (e.g.,  loud,  soft. 
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fost,  slow])  cause  degradation  in  speaker  ID  performance  since  it  is  impossible  to  train 
the  system  on  all  conditions  for  every  speaker. 

The  E/I  contractor  declared  that  external  based  speaker  identification  is  not  as 
good  as  internal  based  speaker  identification  (Shirey  and  Morgan,  1993)  and  thus 
should  be  weighted  less.  However,  there  are  77  speakers  in  the  database  and  this 
large  number  causes  poor  external  based  speaker  ID  performance.  Typical  contractor 
internal  simulations  used  only  10  or  20  speakers,  creating  a  better  likelihood  of  achieving 
a  higher  percentage.  If  there  were  77  languages  (only  6)  and  platforms  (only  4)  in  the 
E/I  database  the  same  behavior  would  be  expected  for  these  processes. 

While  performing  experiments,  observation  showed  ELINT  based  internal 
recognition  percentages  in  the  teens,  certainly  leading  to  a  theory  that  ELINT  is  not  a 
good  predictor,  This  is  why  in  the  general  experimental  methodology  in  Chapter  4 
experiments  ore  recommended  to  be  performed  without  ELINT. 

Despite  the  fact  that  each  individual  process  errs,  it  is  necessary  to  run  all  of  them 
simultaneously.  In  internal  bosed  speaker  ID  (as  well  as  language  and  platform 
identification)  features  are  extracted  then  fed  into  a  pattern  classification  algorithm  to 
moke  a  decision.  In  several  recent  RL  experiments  (Fenstermacher  and  Smith,  1994)  it 
wos  noticed  that  different  feature  sets  through  the  same  classifier  erred  differently  when 
the  decisions  were  compared.  In  addition,  it  wos  noticed  that  the  same  features  fed 
into  multiple  classifiers  also  erred  differently.  Thus,  it  was  theorized  that  using  muttiple 
feature  sets  and  multiple  classifiers  along  with  some  merging  algorithm  would  provide 
better  results  than  only  using  one  of  them. 

This  similarity  extends  to  the  merging  of  ELINT.  externals  and  internals  as  in  the  E/I 
Testbed.  The  three  processes  all  err  differently.  Thus  one  expects,  as  above,  that  the 
results  provided  by  the  merge  would  provide  more  accurate  resutrs  than  each  process 
alone. 
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5.2  WEIGHT  EXPECTATIONS 

Based  on  the  overall  poor  performance  of  ELINT  for  each  of  the  internal 
processes,  it  is  expected  to  be  weighted  less  than  the  others.  In  addition,  some 
experiments  ore  performed  without  ELINT  to  compare  its  performance  to  experiments 
with  ELINT. 

The  voices  for  the  external  weights  are  not  predictable.  If  the  external  weights 
are  low,  the  poor  external  based  speaker  identification  performance  may  by  de- 
emphasized  at  the  expense  of  good  external  based  language  identification 
performance. 

It  cannot  be  predicted  that  the  weights  for  each  of  the  three  processes  will  be 
equal.  RL  has  seen  the  internal  systems  perform  differently  on  different  databases.  It 
can  only  be  assumed  at  this  point  that  the  external  and  ELINT  based  processes  will 
similarly  vary  in  different  parts  of  the  world. 

The  real  answer  to  this  question  will  be  determined  when  the  system  is  fielded  in 
the  future  at  an  operational  site,  and  field  data  is  played  into  the  internals  and  real 
externals  and  ELINT  input  to  the  other  processes.  The  key  is  for  representative  data  to 
be  obtained  beforehand  to  run  through  the  E/I  Data  Fusion  Testbed  to  insure  that  the 
selected  weight  set  will  provide  "satisfactory"  E/I  performance  (whatever  the  definition 
of  satisfactory  is).  This  is  not  necessarily  the  optimal  performance. 

This  thesis  tests  whether  one  weight  set  is  optimal  on  the  semi-realistic  simulated 
data  by  running  20  iterations  of  scenario  groups  (1,2,3)  and  (4,5,6)  and  (7,8,9).  These 
groups  hove  different  external  as  well  as  different  internal  characteristics.  If  the  "best" 
weight  sets  are  different,  then  this  gives  an  indication  that  there  should  be  different 
weight  sets  for  different  databases. 

5.3  EXPERIMENTAL  DESIGN 

For  every  experiment,  a  spreadsheet  was  designed.  This  spreadsheet  has  an 
experiment  name  relevant  to  the  experiment  and  the  parameters  established  for  all 
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variables.  An  example  of  this  spreadsheet  is  included  in  Appendix  C.  The  following 
sections  describe  the  data  on  this  spreadsheet. 

5.3.1  INTERNALS 

Real  internal  outputs  could  not  be  input  to  the  Testbed  for  this  experiment.  The 
platform  identification  algorithms  ore  still  in  the  6.2  exploratory  development  stage. 
Though  speoker  and  longuoge  identification  software  are  resident  in  the  RL  Speech 
Processing  Facility,  the  outputs  are  on  a  second  by  second  basis  not  the  transmission  by 
transmission  decisions  required  by  the  E/I  Testbed.  Some  modifications  to  these 
algorithms  would  be  required  to  moke  their  method  of  reporting  compatible  with  what 
the  Testbed  expects. 

If  all  the  systems  existed  and  transmission  by  transmission  processing  is  provided, 
the  Testbed  software  would  be  required  to  handle  the  outputs  from  the  internals  in  the 
interface  format  provided  by  the  actual  systems.  This  software  has  not  been  developed 
'or  this  interface  protocol. 

Thus  internal  simulations  were  designed,  os  in  the  quick  experiment  in  Chapter  3, 
as  realistically  as  possible.  It  was  desired  to  run  twenty  speakers,  ten  languages,  and 
ten  platforms  with  a  percentage  accuracy  of  between  70-75%.  These  percentages  are 
similar  to  the  actuol  performance  of  the  RL  systems  on  operational  data. 

Since  the  real  systems  provide  confidence  scores  with  o// their  decisions,  the 
number  of  scores  to  report  was  set  equal  to  the  number  of  internals,  contrary  to  the 
quick  experiment  in  Chapter  3  which  only  selected  five  scores  from  twenty  speakers. 

5.3. 1.1  OBTAINING  REALISTIC  ID  ACCURACIES 

There  was  some  design  difficulty  creating  realistic  percentage  accuracies  for 
these  internal  simulations.  With  the  speaker  ID  ground  truth  score  set  to  90,  only  13% 
correct  ID  accuracy  was  obtained  using  a  random  simulation  for  the  secondary  scores 
with  no  added  distortion.  When  the  score  was  increased  to  95,  only  41%  correct  ID 
accuracy  was  obtained.  After  several  iterations,  a  default  first  choice  score  of  98  was 
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finally  settled  on  to  give  the  desired  accuracy  (68.6%).  Similar  iterations  were  performed 
for  language  ID  and  platform  ID,  establishing  a  95  ground  truth  score. 

Though  the  E/I  Testbed  provides  the  capability  to  use  ten  languages  and 
platforms,  there  are  only  six  languages  and  four  platforms  in  the  ground  truth  database. 

If  language  seven  (L7)  was  given  a  score  of  96  by  the  simulation,  the  Testbed  counted  it 
as  on  error  since  it  exceeded  the  95  language  ID  ground  truth  score.  However,  the 
merge  with  zero  (since  L7  is  not  in  the  Externals  Processor  database)  resulted  in  the 
ground  truth  language  merged  score  to  be  higher  (since  the  Externals  Processor  gave  it 
a  score  because  it  is  in  the  database).  This  results  in  numerous  corrections  made  by  the 
correlotion  process  and  a  large  statistical  bias  when  comparing  the  merged  score 
results  with  the  simulation  results.  To  correct  for  this,  the  simulations  for  platform  and 
language  identificotion  were  fixed  at  the  number  in  the  database. 

The  bias  could  not  be  totally  removed,  however.  Due  to  the  simulator  definition 
of  "around",  whenever  L5  and  L6  are  the  ground  truth  for  the  language  ID  simulation,  L7 
and  L8  ore  generated  with  scores.  This  bias  is  minimized,  however,  since  these  two 
languages  occur  only  23  times  out  of  1700  transmissions  in  the  database. 

Removing  bias  for  platform  ID  presented  another  design  difficulty.  The  platform  ID 
simulation  provided  an  error  message  when  setting  the  number  of  platforms  to  four.  The 
message  indicated  that  the  field  would  only  accept  numbers  between  5  and  20.  This 
was  reported  to  the  contractor  and  it  was  noted  that  it  was  important  to  change  the 
number  of  scores  chosen  to  four  before  changing  the  number  in  the  database  to  four. 

In  addition,  it  was  observed  that  due  to  the  definition  of  "around",  P5  was 
simulated  whenever  P2,  P3,  and  P4  were  the  ground  truth  platform.  This  occurs  many 
times  in  the  database. 

After  discussions  with  the  contractor,  an  investigation  of  constant  internal  scoring 
with  distortion  rather  than  the  above  random  scoring  showed  potential  for  the 
simulation.  Using  only  longuoge  iD,  a  constant  ground  truth  score  of  80  was  set  with 
constant  secondary  scoring  of  70  and  a  random  distortion  of  1 1 .  This  produced  a 


42 


acceptable  language  ID  recognition  percentage  with  scores  distributed  between  59 
and  81 . 

After  extensive  analysis,  constant  internal  scoring  was  preferred  os  it  was 
determined  to  be  more  realistic  than  purely  random  scoring,  When  there  is  an  error 
made  by  the  RL  language  identification  system,  the  scores  for  the  two  languages  being 
confused  ore  close.  In  addition,  the  ground  truth  score  of  80  is  more  realistic  than  95. 
Finally,  all  scores  are  relatively  close,  simulating  a  more  realistic  distribution  than  when 
the  random  simulation  gave  a  spread  between  2  and  95. 

For  each  simulation,  iterations  were  performed  before  an  acceptable  score 
spread  and  simulation  accurocy  was  achieved.  This  proved  difficult,  once  again,  for 
the  speaker  identification  simulation.  The  amount  of  distortion  was  forced  to  be  as  high 
as  possible  and  the  constant  secondary  score  as  low  as  possible.  Final  values  for  all 
simulations  are  recorded  in  Appendix  C. 

5.3, 1.2  DYNAMIC  VS.  REPRODUCIBLE 

An  investigation  into  dynamic  versus  reproducible  distortion  was  performed.  It 
v/as  originally  planned  to  run  simulations  with  and  without  FLINT,  but  with  the  same 
internals  and  scores  reproduced,  it  was  also  originally  planned  to  reproduce  the  same 
internals  and  score  sequences  through  several  different  iterations  of  weights. 

In  this  investigation  it  was  discovered  that  reproducible  distortion  only  produces 
one  sequence  of  scores  over  and  over  again,  not  the  same  sequence  from  the  last 
experiment.  This  was  discovered  by  performing  several  dynamic  and  reproducible 
experiments  and  observing  the  language  and  platform  sequences  and  scores  for  the 
first  transmission.  In  every  reproducible  simulation,  only  one  score  sequence  was 
produced.  In  the  dynamic  case,  all  sequences  and  scores  were  different. 

In  the  quick  experiment  in  Chapter  3,  the  statistics  for  the  three  scenario  groups 
were  different  only  because  of  the  differing  numbers  of  transmissions  between  the  three 
scenario  groups.  The  score  sequences  were  exactly  the  same. 
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In  this  experiment  the  reproducible  score  sequence  will  be  used  for  one  iteration 
of  the  (1,2,3)  scenario  group.  All  other  experiments  will  use  dynamic  scoring.  Twenty 
dynamic  iterations  of  scenario  groups  (1,2,3),  (4,5,6),  and  (7,8,9)  will  be  run  to  make  the 
chance  of  bias  less  risky.  In  all  cases,  the  realistic  simulation  of  internals  is  provided  as 
described. 

5.3.2  EXTERNALS 

Externals,  in  the  real  world,  are  provided  by  measurements  obtained  from  the 
radio  station  listener  using  a  specially  designed  system.  RL  has  no  in-house  access  to  this 
system,  thus  a  simulation  was  used  in  these  experiments.  Each  scenario  group  contains 
different  external  values,  however,  all  experiments  using  the  same  scenario  group  used 
the  same  external  values.  As  described  in  Section  1 .3.2,  the  simulation  of  the  externals 
was  realistically  created  based  on  actual  Air  Force  activities. 

Though  the  preliminary  experiment  results  were  not  statistically  significant,  the 
best  external  combination  of  frequency,  radio  type,  and  location  was  chosen  as 
indicated  in  Appendix  C,  Frequency  was  chosen  due  to  its  strong  ties  to  the  internals  and 
scenarios.  Direction  and  modulation  type  were  not  chosen  due  to  the  conflicts  which 
appeared  to  arise  in  Chapter  3. 

Operationally,  a  radio  station  listener  manually  tunes  the  radio  and  locks  onto  a 
frequency  and  has  direct  control  of  receiving  good  frequency  and  location  information. 

In  addition  the  radio  type  is  derived  by  the  knowledge  of  the  radio  station  listener.  Thus, 
the  confidence  measures  of  these  externals  were  kept  at  the  default  value  of  100. 

The  Externals  Processor  relative  weights  were  analyzed  and  are  listed  in  Appendix 
C  at  the  system  default  values  (100),  since  they  are  only  used  as  multiplicative  factors  in 
the  algorithm.  Changing  one  value  higher  or  lower  would  unscientifically  favor  the 
contribution  of  one  external  over  another.  The  frequency  error  tolerance  was  changed 
to  .OOOIMHz,  reflecting  a  more  operationally  realistic  number. 
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5.3.3  ELINT 

RL  has  no  physical  way  to  recreate  the  actual  ELINT  subsystem  at  present  for  live 
input  to  the  Testbed,  thus  the  data  is  simulated  as  accurately  as  possible.  The 
simulation's  reaiism  was  created  by  HRB-Systems  using  a  former  ELINT  analyst. 

ELINT  Processor  relative  weights  were  analyzed  and  kept  at  the  system  defautt 
values  (100)  for  the  same  reason  described  for  the  Externals  Processor  relative  weights. 
The  ELINT  confidence  score  was  set  at  80  showing  operationally  relevant  confidence  in 
the  data  provided  by  the  ELINT  subsystem.  The  relevant  time  and  distance  threshold 
and  error  tolerance  parameters  were  kept  at  operationally  significant  default  values. 
These  values  ore  recorded  in  Appendix  C.  In  addition,  ELINT  is  designated  as  "ON"  if 
ELINT  is  used  and  "OFF"  otherwise. 

5.3.4  OTHER  EXPERIMENTAL  PARAMETERS 

Volues  for  “cutoff",  "beta",  and  "gamma"  are  listed  in  Appendix  C  at  their  default 
values.  Analysis  showed  that  the  contractor  set  these  at  acceptable  values. 

The  correlotion  algorithm(3).  Externals  Processor  algorithm(2),  ELINT  processor 
algorithm(2)  and  their  respective  training  files  (listed  in  Appendix  C  os 
ASS0CIAT10NS.DAT,  RELAvTIONS_ELINT.DAT  and  RELATIONS. DAT)  were  kept  at  the  system 
defaults.  The  algorithms  are  claimed  by  HRB-Systems  (Shirey  and  Morgan,  1993]  to  be 
the  best.  The  training  files  represent  100%  of  the  database,  which  during  testing  usually 
provides  the  best  performance. 

Appendix  C  shows  a  stop  time  set  at  27000  ms,  or  45  minutes  which  is  the  entire 
length  of  the  simulotions  to  provide  as  much  data  as  possible  for  anolysis.  It  also  shows 
a  run  speed  of  MAX,  indicating  the  experiments  will  be  run  as  fost  as  possible.  In 
addition  the  third  flight  path  (C)  was  chosen  which,  as  indicoted  in  Chapter  4,  does  not 
affect  these  results  since  the  direction  external  is  not  used. 

As  indicated  in  Section  5.3. 1.2,  scenario  groups  (1,2,3),  (4,5,6),  and  (7,8,9)  are 
being  used  in  these  experiments.  These  groups  ore  indicated  in  Appendix  0  after  the 
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CW  in  the  experiment  name  at  the  top  of  the  page  as  well  as  individually. 

Finally,  the  maximum  transmission  length  was  set  at  3  seconds.  An  experiment  was 
performed  and  concluded  that  no  matter  where  this  parameter  was  set,  the  E/1  Data  Fusion 
Testbed  gave  the  same  results,  but  provided  the  information  onto  the  screen  faster. 

5.3.5  CORRELATION  WEIGHTS 

For  each  experiment,  the  correlation  merge  weights  are  listed  in  the  appropriately 
labeled  portion  of  Appendix  C.  In  every  row,  the  weights  are  all  equal  to  keep  the  number 
of  different  experiments  needed  to  a  reasonable  number.  This  scheme  is  justified  since  each 
internal  weighting  is  independent  of  each  other  in  the  merge  algorithm. 

Each  experiment  will  change  these  weights  as  described  in  Section  5.3.5. 1  and 
these  alterations  are  recorded  on  the  spreadsheet  in  Appendix  C.  The  experiment  name  at 
the  top  of  the  Appendix  also  includes  the  weights  in  the  order  internals/  externals/  ELINT. 

5.3.5.1  EXPERIMENT  SERIES 

This  section  describes  the  series  of  experiments  planned  to  measure  the  variability 
caused  by  changing  the  weights  of  the  input  processes  to  the  merge  algorithm  and  their 
effect  on  the  correlation  results. 

The  list  of  experiments  is  provided  in  Appendix  D  using  the  form  of  the  experiment 
name  described  in  Section  5.3.5.  The  table  involves  all  combinations  of  weights  in  intervals 
of  25.  This  interval  was  determined  as  a  tradeoff  between  the  amount  of  spacing  and  a 
reasonable  amount  of  time  and  number  of  experiments. 

As  described  in  Section  5.1,  in  this  experimental  design  internals  and  externals  will 
always  be  produced  (never  0)  for  correlation.  Due  to  its  poor  performance,  ELINT  is  not  used 
in  some  experiments. 

It  was  proven  in  a  short  experiment  that  with  ELINT  turned  off  with  a  non-zero  ELINT 
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weight  changes  the  final  results  when  compared  to  a  zero  ELINT  weight.  This  was 
reported  to  the  controctor  os  on  error,  since  the  software  should  hove  automatically 
mode  the  ELINT  weight  zero  whenever  ELINT  was  off.  All  experiments  without  ELINT  were 
performed  with  a  zero  weight. 

With  a  short  experiment,  it  was  proven  that  experiments  with  all  weights  of  100 
provided  identicol  onswers  when  all  three  weights  are  set  at  75.  All  experiments  whose 
ratio  of  weights  are  multiples  of  previously  listed  experiments  were  not  performed  and 
are  identified  in  Appendix  D. 

Experiments  in  Appendix  D  are  reproduced  for  the  scenario  groups  (4,5,6)  and 
(7,8,9)  with  their  experiment  names  coded  appropriately. 

5.3.6  ROUTING  TABLE 

Based  on  the  scenario  groups,  a  routing  table  was  established.  The  objective  of 
the  routing  table  was  to  assign  the  radio  stations  of  interest  to  the  appropriate  listener  as 
described  in  Section  1.1.3. 

Operotionolly,  the  initial  search  criteria  is  language,  as  this  routing  criteria  sends 
the  oppropriate  stations  to  a  listener  with  special  education  and  training  in 
understanding  the  longuage.  The  second  most  important  search  criteria  is  activity,  as 
some  news  reports  ore  more  critical  than  others.  Finally,  the  last  important  search 
criteria  is  speaker,  since  known  news  reporters  provide  more  information  than  disc 
jockeys. 

Since  twelve  listeners  has  operational  relevance,  an  initial  attempt  at  routing  was 
to  establish  twelve  different  routing  assignments  for  each  of  the  three  scenario  groups. 

It  was  initially  desired  that  each  scenario  group  have  its  own  routing  table. 

Initially,  each  three  scenario  combination  was  examined  for  all  possible  languages 
and  activities  of  interest.  Attempting  to  provide  for  twelve  positions  v/ith  only  six 
ianguage/activity  pairs  for  the  scenario  (1,2,3)  group  was  unrealistic,  so  it  was  decided 
to  only  use  six  listeners.  Typically,  there  are  more  language/activity  pairs  ond  more  than 
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three  scenarios. 

Only  three  important  language/activity  pairs  were  detected  in  the  scenario  (4,5,6) 
group,  These  activities  could  have  been  divided  among  two  different  listeners  to  cover 
the  desired  six  positions,  but  this  is  operationally  unrealistic.  Due  to  this  scarceness  of  the 
language/activity  pairs,  it  was  decided  to  create  one  realistic  twelve  posrtion  routing 
table  to  be  used  with  oil  three  scenario  groups. 

The  final  output  is  contained  in  Appendix  E.  The  Appendix  shows  the  12  listeners 
(PI -PI 2)  with  their  routing  responsibilities  coded  in  the  following  format:  scenario 
number/language  numbenspeaker  number;activity  number. 

This  routing  table  was  prepared  by  initially  creating  the  language/activity  pairs 
then  ossigning  each  pair  to  a  unique  listener.  No  one  listener  monitors  for  two  different 
activities  in  the  same  scenario  group.  No  one  listener  monitors  for  more  than  one 
language.  In  addition,  P4  not  only  listens  for  A1  in  the  scenario  group  (1 ,2,3)  but  also  in 
the  scenario  group  (7,8,9). 

Finally,  representative  speakers  were  added  to  the  language/activity  pairs.  These 
speakers  represent  the  most  active  speakers  in  the  language/activity  pairs.  The  E/I  Data 
Fusion  Testbed  default  routing  table  was  replaced  by  the  configuration  specified  in 
Appendix  E  and  used  for  oil  experiments. 

5.3.7  RECORDING  STATISTICS 

Appendix  F  shows  the  audit  file  statistics  recorded  for  analysis  for  each  iteration  of 
each  experiment.  The  internal  simulation  accuracies  are  recorded  as  well  as  the 
external  based  activity  results.  Improvement  by  the  merge  algorithm  is  automatically 
calculated  in  the  spreadsheet  by  subtracting  these  numbers  from  their  respective 
recorded  merge  algorithm  accuracies.  Similarly,  correlation  improvements  are 
calculated  in  the  same  manner  using  the  respective  correlator  hypotheses,  as  in  the 
experiment  in  Chapter  3. 

In  the  correlator  hypotheses  section,  the  percent  of  time  the  top  LSAP  hypothesis 
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was  correct  was  recorded.  The  percentage  of  time  any  of  the  muttiple  LSAP 
hypotheses  was  correct  was  recorded  ond  is  normalized  by  the  number  of  hypotheses 
generated,  since  oil  experiments  produce  a  different  number  of  hypotheses. 

Some  of  the  statistics  used  for  onolysis  ore  labeled  with  either  a  direct  or  indirect 
relevance.  For  instance,  when  optimizing  the  weights  in  the  merge  algorithm,  the 
merged  scores  are  calculated  independently  of  each  other  ond  hove  direct  relevance. 
Since  the  correlation  hypotheses  and  routing  statistics  are  calculated  using  the  merged 
scores  ond  some  internal-internal  correlation,  it  is  less  likely  that  these  results  can  be 
attributed  to  the  quality  of  the  merge  weights,  thus  ore  indirectly  related. 

5.4  EXPERIMENTAL  PERFORMANCE 

Experiments  were  performed  on  the  Testbed  in  on  automated  fashion.  The  three 
seporote  scenario  groups  and  the  fourth  repeatable  group  were  established  with  the 
values  described  in  Section  5.3.  All  four  groups  hod  a  version  with  and  without  ELINT, 
creating  eight  separate  experiment  files. 

In  order  to  change  the  weights  for  each  experiment  listed  in  Appendix  D,  the 
eidefoults  file  was  edited.  Editing  was  also  done  of  the  command  file  to  perform 
experiments  ond  store  results  for  each  of  the  four  scenario  groups.  In'ftially,  a  specially 
designed  E/I  Editor  was  used.  Later,  the  Testbed's  Openwindows  text  editor  was  leaned 
and  used  to  provide  o  quicker  means  to  change  the  weights  and  create  additional 
experimental  data. 

The  Testbed  saved  the  experimental  data  and  step-by-step  results  in  an  audit  file 
for  later  review  and  statistical  recording.  A  method  to  create  a  short  version  of  this 
audit  file  existed  on  the  Testbed.  However,  after  automotically  running  through  several 
weight  groups  and  scenario  groups  with  this  short  version,  on  error  was  detected  in  the 
generated  statistics.  All  iterations  processed  in  this  fashion  were  rerun  in  the  memory 
intensive  step-by-step  fashion. 

Memory  problems  on  the  Testbed  resulted  after  saving  several  long  step-by-step 
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audit  files.  As  o  result,  additional  experiments  had  to  be  performed  for  those  which  no 
memory  existed  to  sove  the  results.  Using  the  UNIX  Head  and  Toil  functions  in  the 
command  file  eliminated  this  problem  by  only  saving  the  first  (to  verify  the  experimental 
conditions)  and  last  (to  print  the  audit  file  statistics)  TOO  lines  of  each  audit  file. 

To  create  the  data  for  analysis,  these  audit  file  results  were  recorded.  Initially,  the 
results  were  hand-written  onto  paper  copies  of  Excel  spreadsheets  erected  for  each 
scenario  group.  This  process  proved  to  be  slow  ond  tedious.  Eventually  the  statistics 
from  the  audit  file  were  printed  for  each  scenario  and  weight  group  using  the 
Openwindows  Print  Monitor. 

All  printed  statistics  for  each  scenario  group  and  weight  group  were  typed  into 
Microsoft  Excel  spreadsheets  illustrated  in  Appendix  F.  These  spreadsheets  were 
configured  to  outomaticolly  generate  the  required  means  and  standard  deviations  for 
analysis. 

Following  generation  of  all  spreadsheets,  the  data  was  analyzed  for  accuracy 
using  the  standard  deviation  statistic.  After  the  observation  of  several  spreadsheets,  any 
deviation  statistic  greater  than  two  was  rechecked.  This  resulted  in  the  correction  of 
several  typographical  errors  resulting  from  the  manual  data  entry  procedure. 

Other  errors  due  to  manual  data  entry  were  detected  and  corrected  in  the 
language  identification  improvement  formulas,  the  activity  correct  statistics,  and  the 
routing  statistics.  Since  the  external  and  ELINT  data  was  never  altered  during  the  20 
iterations  of  each  scenario  group,  the  merge  algorithm's  activity  correct  statistic  was 
checked  to  be  constant  for  all  iterations.  Due  to  the  way  the  routing  table  was 
created,  the  mean  routing  partially  correct  and  the  mean  routing  occuracy  statistics 
were  checked  to  be  equal. 

5.5  EXPERIMENTAL  ANALYSIS 

Final  statistics  generated  by  the  experiments  and  used  for  analysis  are  contained 
in  Appendix  G.  These  statistics  illustrate  the  average  improvements  and  routing 
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percentages  obtained  after  the  merge  and  correlation  algorithms  for  the  20  iterations  of 
scenario  groups  (1,2,3),  (4,5,6),  and  (7,8,9).  The  final  statistics  also  include  these  same 
results  for  the  one  reproducible  iteration  of  scenario  group  (1,2,3).  The  merge  and 
correlation  algorithm  stotistics  were  summed  to  create  a  number  signifying  the  total 
improvement  gained  by  these  two  algorithms. 

Ranks  were  created  separately  for  the  resutts  of  the  merge  and  correlation 
algorithm  using  the  Microsoft  Excel  sort  function.  These  ranks  were  predominantly  used 
in  the  analysis  stage.  Though  the  data  in  Appendix  G  was  created  as  carefully  as 
possible,  since  there  wos  no  way  to  reproduce  the  same  twenty  iterations  of  internal 
data  through  all  weight  groups,  the  individual  statistics  are  subject  to  error.  However,  it 
is  later  shown  in  this  thesis  that  despite  these  errors,  the  ranking  results  for  the  twenty 
random  iterations  of  scenario  group  (1,2,3)  are  comparable  with  the  one  repeatable 
iteration  of  (1,2,3)  supporting  the  use  of  these  ranks  for  this  analysis  and  the  conclusions 
made  in  this  experiment. 

In  the  preliminary  analysis  stage,  the  merge  algorithm  statistics,  the  correlation 
hypothesis  results,  the  correlation  internal  improvements  and  correlation  routing  statistics 
were  separately  ranked  os  iilustroted  in  Appendix  H.  As  con  be  noted  by  the  ranks  for 
the  correlation  hypothesis  results  and  the  routing  statistics  for  weight  group  75/50/0, 
these  rankings  were  sometimes  very  contradictory  within  the  scenario  group  and  these 
contradictions  made  it  very  difficult  to  come  up  with  definite  trends.  Contradictions 
among  scenario  groups  were  also  just  as  plentiful. 

To  alleviate  this  analysis  a  correlation  total  was  calculated  and  ranked  as 
illustrated  in  Appendix  G.  Intuitively,  this  makes  sense  since  the  sum  of  the  correlation 
hypothesis  results,  internal  improvements,  and  routing  statistics  accurately  reflects  the 
quality  of  this  algorithm. 

For  each  column  in  Appendix  G,  a  grand  total  is  calculated  for  each  of  the 
weight  groups  by  summing  the  merge  ond  correlation  totols.  This  grand  total  reflects 
the  overall  improvement  score  for  a  particular  weight  group. 
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Using  the  Microsoft  Excel  sort  function,  ronks  from  best  to  worst  were  created  for 
the  merge  total,  correlation  total  and  grand  total.  It  was  discovered  that  when  adding 
the  merge  ond  correlation  totols  in  this  fashion,  the  ranking  for  the  grand  total  was 
strongly  biased  in  favor  of  the  correlation  total  since  it  was  generally  7  times  larger  than 
the  merge  total.  In  order  to  lessen  this  bias,  the  grand  totals  in  Appendix  G  are 
calculated  by  multiplying  the  merge  total  by  7,  then  adding  the  correlation  total. 

Finally,  the  sum  of  the  merge  and  correlation  algorithm  ranks  was  created  and 
separately  ranked  and  this  ranking  is  the  way  the  results  are  displayed  in  Appendix  G. 
Once  the  merge  statistics  were  multiplied  by  seven,  this  ranking  and  the  ranking  for  the 
grand  totol  are  olmost  identical. 

Data  analysis  was  done  in  three  different  ways  with  the  results  shown  in  Tables  5, 

6,  and  Figures  3-8.  This  data  is  used  to  perform  the  comparison  of  weights  in  Section 
5.5.1. 

The  first  method  of  analysis  began  with  a  judgment  of  each  of  the  individual 
numbers  in  Appendix  G.  Each  row  of  each  scenario  group  was  analyzed  separately 
and  significant  negative  numbers  and  other  deviations  noted.  An  example  of  a 
deviation  is  for  scenario  group  (1,2,3)  noting  that  the  statistics  obtained  when  the 
internal  weight  is  25  performed  worse  than  all  other  internal  weights. 

The  correlation  algorithm  LSP  improvement  statistic  was  not  analyzed  since  when 
the  individual  LSP  statistics  were  noted  as  deviant,  the  total  LSP  was  similarly  affected. 

In  addition,  the  routing  statistics  completely  correct,  partially  correct  and  accuracy 
were  not  examined  since  the  differences  from  experiment  to  experiment  were  small. 

Table  5  gives  the  results  of  this  first  analysis.  In  this  table,  the  weight  groups  with 
nothing  noted  are  not  listed.  Deviant  or  negative  statistics  for  scenario  group  (1 ,2,3)  are 
denoted  as  A,  repeatable  group  (1,2,3)  are  R,  scenario  group  (4,5,6)  are  B  and 
scenario  group  (7,8,9)  are  C. 

It  was  also  generally  noted  while  performing  this  analysis,  that  as  the  amount  of 
FLINT  decreased,  then  the  results  seemed  to  improve.  Another  general  trend  noted  at 


75/25/75 


53 


this  time  was  the  poorer  performance  of  the  correlation  language,  speaker,  activity  and 
platform  improvements  compared  with  the  results  of  the  merge  algorithm. 

A  second  anolysis  was  done  by  listing  the  best  seven  (10%  of  the  66  total)  and 
worst  seven  weight  groups  based  on  the  merge  rank  and  total  rank  for  the  four 
scenario  groups.  Since  the  correlation  rank  is  not  directly  a  measure  of  the  alteration  of 
the  merge  weights  (some  internal-internal  correlation  and  alternate  hypothesis 
generation  occurred),  its  trends  were  not  examined  separately.  No  analysis  of  the  rank 
for  the  final  totals  in  Appendix  G  was  performed  since  this  ranking  closely  follows  the 
ranking  of  the  sum  of  the  merge  and  correlation  algorithm  ranks.  Table  6  illustrates  the 
results  of  this  analysis. 

The  third  method  of  anolysis  compared  accumulated  total  ranks.  For  example,  to 
examine  the  effect  of  changing  the  FLINT  weight  100/100/100  was  placed  in  a 
spreodsheet  side  by  side  with  100/100/75,  100/100/50,  100/100/25,  and  100/100/0.  Next, 
100/75/100  was  placed  side  by  side  with  100/75/75,  100/75/50,  100/75/25,  and  100/75/0. 
This  continued  for  all  combinations  of  the  internal  and  external  weights. 

Following  creation  of  this  spreadsheet,  the  merge  ranks  and  total  ranks  with  the 
same  FLINT  weight  were  added  together.  Graphs  were  made  for  analysis  and 
comparison  illustrating  the  sum  of  the  ranks  for  each  FLINT  weight.  Graphs  were  also 
produced  for  internals  and  externals  in  the  same  manner  and  are  included  in 
Figures  3-8. 

5.5.1  COMPARATIVE  DATA  ANALYSIS 

The  similarity  of  results  for  scenario  group  (1,2,3)  and  the  one  repeatable  iteration 
of  (1,2,3)  is  verified  by  the  data  analysis.  In  Table  5,  A  and  R  are  listed  in  the  same 
column  46  times  as  opposed  to  either  being  listed  alone  6  times.  In  Table  6,  the  same 
three  weight  combinations  appear  in  23  of  28  columns.  Finally,  the  weight  values  in 
Figures  3-8  are  extremely  close,  sometimes  exactly  overlapping,  further  supporting  this 
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BEST 

MERGE  RANK 

(7,8,9) 

1 

100/25/0 

100/75/0 

75/25/50 

75/25/0 

2 

100/100/0 

75/50/0 

100/75/25 

100/25/0 

3 

75/25/0 

100/50/0 

100/50/25 

75/50/0 

4 

75/50/0 

75/25/0 

75/75/25 

100/50/25 

5 

75/50/25 

100/25/0 

100/50/50 

75/75/25 

6 

100/50/0 

100/100/0 

75/25/25 

100/75/50 

7 

100/50/25 

75/75/25 

100/75/50 

75/25/25 

BEST 

BEST 

TOTAL  RANK 

(1,2,3) 

(7,8,9) 

1 

75/50/25 

100/75/0 

75/75/25 

75/50/0 

2 

75/50/0 

75/50/0 

75/25/50 

75/75/25 

3 

100/100/25 

100/100/25 

100/50/25 

100/100/0 

4 

100/100/0 

75/75/25 

100/75/75 

75/25/0 

5 

75/100/25 

75/100/25 

50/50/75 

100/75/0 

6 

75/75/25 

100/50/0 

100/50/100 

100/50/25 

7 

75/100/0 

75/50/25 

75/50/50 

100/50/0 

WORST 

WORST 

WORST 

WORST 

MERGE  RANK 

(1,2.3) 

R(1,2.3) 

(4.5,6) 

(7,8.9) 

60 

25/75/50 

25/100/75 

25/75/50 

25/100/100 

61 

25/100/100 

25/75/50 

25/75/75 

25/100/25 

62 

25/100/75 

25/100/100 

25/100/50 

25/50/75 

63 

25/50/75 

25/75/100 

25/100/100 

25/75/50 

64 

25/25/100 

25/50/75 

25/25/75 

25/50/100 

65 

25/75/100 

25/25/100 

25/75/0 

25/100/75 

66 

25/50/100 

25/50/100 

25/100/0 

25/75/100 

WORST 

WORST 

WORST 

R(1.2.3) 

(4,5,6) 

(7.8,9) 

60 

50/25/100 

50/25/100 

25/100/50 

25/25/75 

61 

25/50/75 

25/100/50 

25/100/0 

25/100/75 

62 

25/50/100 

25/50/75 

25/100/25 

50/25/100 

63 

25/25/75 

25/25/75 

25/50/100 

25/100/100 

64 

25/25/100 

25/25/100 

25/100/100 

25/100/50 

65 

25/75/100 

25/100/75 

50/25/75 

25/25/100 

66 

25/100/75 

25/75/100 

25/25/75 

25/75/100 

TABLE  6;  BEST/WORST  ANALYSIS 
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The  analyzed  dota  tables  and  figures  were  used,  where  possible,  to  determine 
the  best  and  worst  weights  for  internals,  externals,  and  ELINT.  Though  the  overall  goal 
was  to  find  the  best,  it  was  importont  to  point  out  those  values  which  should  never  be 
considered  to  be  the  best.  In  some  coses,  the  best  or  worst  analysis  was  initially 
narrowed  to  two  or  three  choices,  which  were  then  compared  and  conclusions  made. 

It  is  concluded  that  having  a  weight  of  25  for  the  internals  is  unacceptable  for  all 
four  scenario  groups.  In  Table  5,  this  is  illustrated  by  being  the  internal  with  the  greatest 
number  of  letters  A,  B,  C,  and  R.  In  addition,  all  of  the  combinations  in  the  “WORST” 
columns  in  Table  6  except  four  have  on  internal  weight  of  25.  Finally,  Figures  3  and  4 
show  much  higher  total  ranks  for  this  weight  than  any  other. 

Data  analysis  also  supports  the  conclusion  that  an  internal  weight  of  75  or  100  is 
the  best.  In  Toble  5,  there  ore  very  few  occurrences  of  these  internal  weights  and  in 
most  cases  the  ELINT  weight  values  contribute  significantly  to  these  poor  results.  For 
exomple,  in  the  scenorio  groups  labeled  A,  C,  and  R,  the  ELINT  weights  are  high  in  ail 
cases  (50,  75,  100)  with  one  exception  (75,25,25).  It  can  be  observed  in  Figures  5  and  6 
that  these  ELINT  weights  contribute  to  poorer  performance  by  their  higher  total  ranks  for 
these  groups,  in  addition,  for  the  scenario  group  labeled  B  the  same  thing  can  be 
shown  with  two  exceptions  (weight  combinations  100/50/50  and  l(X)/25/50). 

Table  6  also  illustrates  that  best  results  are  achieved  with  internal  weights  of  75  or 
100.  All  of  the  combinations  listed  as  "BEST"  in  this  table  contain  either  of  these  two 
weights  except  for  one.  This  is  further  reinforced  by  the  low  total  ranks  for  these  weights 
in  Figures  3  and  4. 

There  is  no  strong  indication  which  of  these  internal  weights  are  better.  The 
numbers  appear  in  the  "BEST"  columns  of  Table  6  almost  equally  (27  versus  28).  In 
addition.  Figures  3  and  4  show  a  lower  total  rank  for  the  100  weight,  but  not  by  a  very 
significant  amount. 

The  analysis  of  ELINT  weights  provides  different  conclusions  for  scenario  group 
(4,5,6)  as  compared  with  the  other  soenarios.  For  group  (4,5,6),  25  and  50  are  the  best. 
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with  0,  75,  and  100  being  the  worst.  For  the  other  3  scenario  groups  0  and  25  are  the 
best,  with  75  and  100  being  the  worst. 

For  scenario  groups  (1,2,3),  (7.8,9)  and  the  repeatable  group  (1,2,3),  each  of  the 
weights  judged  os  worst  (75  and  100)  individually  have  many  more  letters  (A,C,R)  in  the 
columns  of  Table  5  then  the  other  weights.  In  addition.  Table  6  has  these  numbers  in 
the  FLINT  position  in  36  of  the  42  "WORST"  weight  groups.  Finally,  it  can  be  plainly  seen  in 
Figures  5  and  6  that  total  ranks  for  these  weights  are  higher  than  all  others  for  these 
scenario  groups. 

As  far  as  which  one  is  the  worst  value  for  these  scenario  groups,  it  is  concluded  to 
be  the  100  weight,  This  weight  is  in  the  “WORST"  columns  of  Table  6  twenty-two  times 
compared  to  fourteen  for  the  75  weight.  In  addition.  Figures  5  and  6  show  the  100 
weight  consistently  with  o  higher  total  rank  than  the  75  weight. 

The  best  weights  for  these  three  scenario  groups  are  concluded  to  be  0  and  25. 
They  are  listed  only  twice  in  Table  5  without  all  the  other  scenario  groups  and  are  in 
every  one  of  the  "BEST"  ranking  combinations  in  Table  6  except  one.  Finally,  in  Figures  5 
and  6,  FLINT  weights  of  0  and  25  have  the  lowest  total  ranks  for  these  three  scenario 
groups. 

The  vyeight  of  0  is  concluded  to  be  the  best  for  these  scenario  groups.  It  occurs 
more  times  in  Table  6  in  the  "BEST"  columns  (25  versus  16)  and  is  the  lowest  of  the  total 
ranks  in  Figures  5  and  6  making  it  a  better  choice  than  25. 

For  the  (4,5,6)  scenario  group.  Table  5  shows  some  activity  and  platform 
identification  instability  in  the  merge  algorithm  (illustrated  in  these  rows  as  a  B  alone). 
Further  onolysis  of  this  table  illustrates  that  there  are  many  B's  in  the  columns  with  75  and 
100  FLINT  weights.  In  addition,  there  are  a  few  B's  in  columns  with  a  zero  FLINT  weight 
which  are  not  present  for  the  other  groups. 

The  vyeight  groups  in  the  WORST  columns  of  Table  6  for  this  (4,5,6)  scenario  group 
are  not  dominated  by  one  FLINT  weight,  though  75  occurs  more  than  any  other  (4)  and 
there  are  three  occurrences  of  100  and  0.  Since  there  are  three  50's  as  well,  this  table 
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only  reinforces  in  a  small  way  the  poor  performance  of  weights  0,  75  and  100  for  this 
scenario  group. 

The  graphs  in  Figures  5  and  6  clearly  illustrate  the  poor  performance  of  these 
weights  for  this  scenario  group.  These  total  ranks  are  much  higher  than  the  ranks  for  the 
25  and  5  weights.  However,  not  one  of  these  tables  clearly  indicates  which  weight  value 
is  worse  than  the  others. 

For  this  scenario  group,  ELINT  weights  of  25  and  50  are  considered  the  best.  They 
have  the  fewest  number  of  B's  in  Table  5.  In  Table  6,  these  values  are  in  all  of  the  "BEST" 
weight  groups  in  the  merge  rank  table  and  in  four  of  the  seven  weight  groups  in  the 
total  rank  table,  including  the  top  three.  Finally,  these  two  weights  have  the  lowest  total 
ranks  in  Figures  5  and  6. 

As  for  as  conciuding  which  of  these  two  weights  is  better.  Table  6  provides  no 
information  since  they  appear  an  equal  number  of  times  in  the  "BEST"  tables.  However, 
Figures  5  and  6  show  the  weight  of  25  with  a  lower  total  rank  concluding  it  as  the  best 
for  this  group. 

An  analysis  was  performed  as  to  why  the  zero  weight  was  the  best  for  one 
scenario  group  and  the  worst  for  the  (4,5,6)  scenario  group.  The  differences  were 
attributed  to  the  merge  algorithm  platform  and  activity  instabilities  discovered  during 
the  analysis  of  Table  5. 

A  detailed  examination  of  the  merge  algorithm  platform  identification 
improvement  statistic  was  undertaken  for  the  (4,5,6)  group  to  explain  these  differences. 
For  this  group,  the  comparative  results  for  the  ELINT  weight  0  was  better  than  both  the  25 
and  50  results  only  6%  of  the  time.  In  addition,  the  results  for  the  0  weight  was  better 
than  one  of  these  results  only  25%  of  the  time  (this  result  includes  the  6%). 

This  is  completely  contrary  to  the  results  for  the  other  weight  groups  contributing 
to  the  ranking  differences  in  Figures  5  and  6.  The  repeatable  (1,2,3)  scenario  group 
always  had  better  results  for  the  0  weight.  For  the  (7,8,9)  scenario  group,  50%  of  the  time 
the  results  for  the  0  weight  was  better  than  both  the  results  for  the  25  and  50 
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weights  and  75%  of  the  time  the  0  weight  was  better  than  one  of  them.  Finally,  for  the  (1,2,3) 
scenario  group.  50%  of  the  time  the  0  weight  was  better  than  both  the  25  and  50  weight 
results  and  75%  of  the  time  one  or  the  other  was  better. 

The  differences  in  ranking  caused  by  the  merge  algorithm  activity  identification 
statistic  were  also  examined  in  detail.  For  the  (4,5,6)  scenario  group,  this  statistic  generally 
decreased  from  5.68  to  zero  when  the  FLINT  weight  is  0.  In  the  other  scenario  groups,  the 
FLINT  weight  of  zero  increased  this  statistical  result  by  a  very  small  amount.  A  5.68  drop  in  the 
merge  total  and  final  total  for  the  (4,5,6)  scenario  group  equates  to  about  a  15  place 
increase  in  rank.  Thus,  this  statistic  contributed  to  the  (4,5,6)  group's  ranking  differences. 

For  externals,  a  weight  of  25  is  concluded  to  be  one  of  the  worst  values.  In  Table  5,  an 
A,  B,  C  or  R  appears  more  times  in  these  columns  than  any  other  external  weight.  This  weight 
also  appears  many  times  in  the  "WORST"  columns  of  Table  6.  In  addition,  though  there  is  not  a 
significant  difference  in  the  ranks  in  Figure  7,  with  the  exception  of  scenario  group  (4,5,6),  this 
weight  is  never  the  best.  Finally,  the  ranks  in  Figure  8  clearly  illustrate  that  this  weight  is  the 
worst  since  the  total  ranks  are  much  higher  than  any  other. 

The  external  weight  of  100  is  also  concluded  to  be  one  of  the  worst  values.  Though 
this  weight  has  few  letters  alone  in  Table  5,  it  appears  more  times  than  any  other  external 
value  in  the  “WORST"  columns  of  Table  6.  In  addition,  it  ranked  near  the  poorest  in  Figure  7.  In 
Figure  8,  it  appeared  as  the  best  for  some  scenario  groups,  though  not  by  a  significant 
enough  amount  to  negate  this  conclusion. 

As  far  as  a  conclusion  for  the  best  external  weight,  a  value  of  50  is  not  listed  much  in 
Table  5  and  appears  the  most  number  of  times  in  the  "BFST"  ranking  columns  in  Table  6.  In 
addition,  in  Figures  7  and  8  the  scenario  group  (4,5,6)  clearly  shows  the  50  weight  as  the  best. 
For  the  other  scenario  groups.  Figures  7  and  8  show  that  this  weight  is  either  the  best  or  not 
significantly  different  from  the  best. 

Considering  all  three  weights  at  the  same  time,  analysis  shows  that  the  weight 
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combination  75/75/25  is  the  only  combination  that  appears  on  all  of  the  "BEST"  total  rank 
tables  and  three  of  four  “BEST"  merge  ranking  tables  in  Table  6.  In  addition,  further 
analysis  showed  that  the  "BEST"  total  rank  and  merge  rank  weight  combinations  which 
provided  optimal  results  in  this  table  (#1  on  the  list)  were  never  the  same. 

In  most  cases,  as  indicated  above,  the  correlation  speaker,  language,  activity, 
and  platform  improvements  were  less  than  those  provided  by  the  merge  algorithm.  It  is 
concluded  that  the  convention  of  generoting  olternate  hypotheses  for  routing  (the  star 
convention  as  described  in  Section  2.3)  degrades  the  individual  results. 

5.6  SUMMARY  AND  CONCLUSIONS 

Tables  7  and  8  summarize  the  best  and  worst  weights  respectively  for  externals, 
internals,  and  ELINT  os  concluded  by  these  experiments. 


SCENARIO  GROUP 

(1,2,3) 

R( 1,2,3) 

(4,5,6) 

(7,8,9) 

EXTERNALS 

50 

50 

50 

50 

INTERNALS 

75  OR  100 

75  OR  100 

75  OR  100 

75  OR  100 

ELINT 

0 

0 

25 

0 

TABLE  7:  EXPERIMENTAL  CONCLUSION:  BEST  WEIGHT  VALUE(S) 


SCENARIO  GROUP 

(1,2,3) 

R(1.2.3) 

(4,5,6) 

(7,8,9) 

EXTERNALS 

25,100 

25,100 

25,100 

25,100 

INTERNALS 

25 

25 

25 

25 

ELINT 

100 

100 

0,75,100 

100 

TABLE  8;  EXPERIMENTAL  CONCLUSION:  WORST  WEIGHT  VALUE(S) 
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Based  on  the  data  for  these  experiments,  there  is  no  unique  weight  group  which 
is  optimal  for  all  four  experiment  groups.  The  best  scoring  combination  in  each  scenario 
group  was  not  the  same  ond  the  characteristics  of  the  (4,5,6)  group  caused  its 
individual  weight  choices  to  be  different  from  the  others. 

Considering  the  data,  the  combination  75/75/25  would  be  a  good  default  for  the 
system.  This  combination  appeared  prominently  in  the  "BEST"  columns  of  Table  6.  The 
internal  value  (75)  matches  with  the  value  in  Table  7.  The  external  value  (75)  is  neither 
judged  best  nor  worst,  however,  given  the  similarities  between  this  weight  and  the  50 
weight  in  Figures  7  and  8,  this  value  is  adequate.  Finally  the  FLINT  value  was  judged  as 
best  for  scenario  group  (4,5,6)  and  a  close  second  best  for  the  other  scenario  groups 
making  it  an  adequate  choice  for  the  system  default. 

it  is  recommended  that  any  future  system  allow  for  the  evaluation  of  the 
effectiveness  of  routing  transmissions  based  only  on  the  best  results  of  the  merge 
algorithm.  Consistently  correct  LSAP  decisions  in  the  nineties  were  obtained  dfter  the 
merge  and  good  routing  statistics  are  thus  expected  for  the  best  LSAP  combination,  if 
successful,  this  would  alleviate  the  need  for  the  correlotion  algorithm  which  degraded 
the  individual  results,  but  moximizes  the  routing  percentages  through  the  generation  of 
alternate  hypotheses. 

It  is  recommended  that  a  capability  to  reproduce  one  scenario  combination  for 
many  experimental  conditions  be  added  to  the  system.  Currently,  only  one 
reproducible  experiment  could  be  run.  Adding  this  copability  would  allow  for  more 
realistic  reproducible  data  to  be  run  through  different  experimental  conditions. 
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Chapter  6 


THESIS  SUMMARY 


This  thesis  described  the  RL  Data  Fusion  Testbed  and  its  development  history.  All 
aspects  of  the  Data  Fusion  Testbed  are  described  in  detail  including  the  internals, 
externals,  merge  algorithm,  and  four  different  correlation  algorithms. 

This  thesis  had  a  heavy  emphasis  on  experimentation.  It  summarized  a  group  of 
ten  different  experiments  which  could  be  run  using  the  Testbed  and  performed  a 
preliminary  and  detailed  experiment. 

In  the  chapter  which  describes  the  ten  experiments,  the  objective  of  each 
experiment  and  a  general  statement  of  the  methodology  to  run  this  experiment  was 
given.  Finally,  expected  results  are  listed. 

Each  of  these  experiments  was  ranked  on  the  following  criteria:  ease  of  use, 
interest  to  the  operational  community,  interest  to  the  speech  processing  community,  and 
interest  to  me.  This  ranking  was  used  to  select  the  detailed  experiment  performed. 

During  performance  of  this  thesis,  several  computer  skills  were  learned.  Among 
them  are  several  functions  from  the  UNIX  operating  system,  the  Openwindows  Text 
Editor  and  Print  Monitor,  and  a  method  to  list  and  delete  files  on  the  Sun.  These  skills  were 
used  extensively  to  automatically  change  weights,  run  experiments,  and  record  results 
during  the  detailed  experiment. 

This  thesis  detailed  a  preliminary  experiment  which  was  performed  to  learn  about 
Testbed  experimentation,  and  to  exercise  the  Testbed  simulator  and  audit  file.  This 
preliminary  experiment  had  as  its  objective  to  determine  which  set  of  externals  provides 
the  best  correlation  results.  Experimental  procedures  during  this  preliminary  experiment 
were  followed  during  the  detailed  experiment. 
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Though  the  amount  of  date  processed  through  the  preliminary  experiment  did 
not  result  in  statistically  significant  results,  the  experiment  concluded  that  the 
combination  of  frequency,  radio  type,  and  location  provided  the  best  performance  of 
all  external  combinations.  Performance  was  measured  in  terms  of  mean  speaker 
identification  and  LSAP  (language,  speaker,  activity,  and  platform  identification) 
improvement. 

The  preliminary  experiment  also  showed  how  different  externals  and  combinations 
of  externals  degraded  E/I  performance.  Addition  of  the  direction  external  to  other 
combinations  seemed  to  hinder  performance.  The  modulation  type  and  radio  type 
externols  seemed  to  work  weekly  together. 

The  preliminary  experiment  also  showed  a  strong  correlation  between  the  external 
frequency  and  good  results  since  the  top  scoring  combinations  included  that  external. 
However,  it  was  noted  that  there  were  no  frequency  changes  for  the  scenarios,  an 
operationally  unrealistic  situation. 

The  finol  conclusion  of  the  preliminary  experiment  examined  the  performance  of 
the  different  numbers  of  externals.  This  concluded  that  in  accordance  with  data  fusion 
theory,  the  larger  the  number  of  externols  the  larger  the  Improvement.  The  difference 
between  the  four  and  five  external  case  was  shown  to  be  smaller  than  the  difference 
between  the  other  steps. 

The  detailed  experiment  had  as  its  objective  to  analyze  the  weighting  of  external, 
internal  and  ELINT  results.  This  experiment  was  carefully  designed  to  reach  valid 
conclusions. 

The  design  of  the  detailed  experiment  examined  all  Testbed  parameters  in  an 
attempt  to  be  operationally  realistic.  The  design  decisions  to  use  twenty  iterations,  three 
separate  scenario  groups  ((1,2,3),  (4,5,6),  and  (7,8,9))  and  by  requiring  that  the  number 
of  platforms  and  languages  be  equal  to  the  number  available  was  on  attempt  to 
minimize  the  experimental  bias.  In  addition,  a  realistic  routing  table  was  designed  and 


implemented. 
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The  internal  simulations  were  designed  to  create  operationally  realistic  accuracies 
and  score  spreads.  It  was  concluded  through  detailed  analysis  that  constant  internal 
sooring  with  distortion  provided  more  realistic  data  than  random  internal  scoring. 

It  was  during  the  design  of  the  internal  simulations  that  it  was  discovered  that 
there  was  only  one  way  to  reproduce  the  same  data  through  all  different  weight 
groups.  Thus,  in  addition  to  the  three  scenario  groups,  a  fourth  reproducible  group  was 
established. 

It  was  concluded  through  a  very  short  experiment  during  the  design  phase  that 
changing  the  maximum  transmission  length  parameter  only  changes  the  length  to  time 
before  the  testbed  disploys  its  results.  There  is  no  change  in  the  experimental  statistical 
results. 

A  list  of  experiments  was  created  to  test  the  experimental  objective.  During  the 
design  of  this  list,  a  short  experiment  proved  that  two  different  experiments  where  the 
weights  were  multiples  of  eoch  other  provided  the  same  results.  For  example,  an 
experiment  with  the  three  processes  weighted  50/25/25  produced  identical  results  to  an 
experiment  with  the  weights  100/50/50.  This  lessened  the  number  of  experiments 
required  to  be  run  during  the  detailed  experiment. 

Following  the  performance  of  all  experiments  and  calculation  of  the  means. 

Tables  5,  6  and  Figures  3-8  were  created  representing  three  different  ways  of  looking  at 
the  data.  During  experimental  analysis,  the  results  obtained  from  the  one  reproducible 
scenario  group  very  closely  followed  the  performance  of  the  20  different  iterations  of 
the  same  scenario  group  for  all  FLINT,  external  and  internal  weights.  This  conclusion 
reinforoes  the  conolusions  of  this  detailed  experiment,  even  though  there  was  no  way  to 
run  multiple  iterations  of  reproducible  data  through  all  scenario  groups. 

Following  data  analysis  the  best  and  worst  weights  were  determined  and  are 


repeated  in  Tables  9  and  10  respeotively.  These  conclusions  agreed  with  the 
expectations  predicted  in  this  thesis. 
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SCENARIO  GROUP 

(1,2,3) 

R  (1,2,3) 

(4,5,6) 

(7,8,9) 

EXTERNALS 

50 

50 

50 

50 

INTERNALS 

75  OR  100 

75  OR  100 

75  OR  100 

75  OR  100 

ELINT 

0 

0 

25 

0 

TABLE  9:  BEST  WEIGHTS 


SCENARIO  GROUP 

(1,2,3) 

R  (1,2,3) 

(4,5,6) 

(7,8,9) 

EXTERNALS 

25,  100 

25,  100 

25,  100 

25,100 

INTERNALS 

25 

25 

25 

25 

ELINT 

100 

100 

0,  75,100 

100 

TABLE  10:  WORST  WEIGHTS 


The  expectation  that  the  conclusions  would  be  different  for  each  weight  group 
was  confirmed.  The  conclusions  for  the  (4,5,6)  group  were  different  than  the  others  due 
to  platform  and  activity  improvement  instability. 

A  75/75/25  weight  group  was  recommended  for  the  system  default  for  all  four 
scenario  groups.  Though  these  weights  were  not  rated  as  the  "BEST"  for  all  scenario 
groups,  analysis  showed  them  adequate  enough  to  be  recommended  as  the  default 
weights. 

Analysis  also  showed  that  the  merge  algorithms  LSAP  improvements  were 
consistently  greater  than  the  correlation  algorithm  improvements.  This  questions  the  use 
of  the  correlation  algorithm. 

During  performance  of  these  experiments  three  software  errors  were  detected. 


When  ELINT  is  not  used,  the  weight  for  ELINT  is  not  made  0  causing  for  some  statistical 
differences  as  compared  to  when  the  weight  is  0.  In  the  E/I  Editor,  the  search  and 
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replace  function  would  not  work.  Finally,  when  trying  to  run  automated  experiments  in 
the  short  fashion,  the  routing  statistics  turned  out  incorrect. 

Besides  correcting  these  errors,  various  recommendations  ore  made  through  this 
thesis.  Experiments  with  a  database  with  frequency  changes  would  test  actual  strength 
of  the  frequency  external,  and  would  be  more  operationally  realistic  then  the  E/I 
database. 

Changing  the  software  to  be  able  to  run  experiments  with  the  same  data  is 
recommended.  This  would  allow  for  more  controlled  experiments,  with  repeatable 
results. 

Other  data  fusion  and  correlation  algorithms  should  be  implemented,  tested  and 
evaluated  against  the  current  ones.  In  addition,  the  decrease  in  performance  due  to 
the  current  correlation  algorithm  should  be  analyzed. 

Allowing  the  evaluotion  of  routing  based  on  the  results  of  the  merge  algorithm  is 
recommended.  The  percentages  for  each  internal  after  the  merge  algorithm  was 
consistently  ninety  and  above  which  would  expect  routing  accuracies  greater  than  the 
40%  currently  obtained  in  the  field. 

Finally,  the  addition  of  statistical  analysis  routines  to  the  Testbed  will  eliminate  the 
need  to  do  these  functions  manually  as  was  done  in  this  thesis.  Addition  of  these 
routines  would  make  experimental  data  collection  faster. 
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APPENDIX  A 


SYSTEM  DEFAULTS  IN  EIDEFAULTS  FILE 


DEFAULT  ALGORITHM  NUMBER 

TRAINING  FILES 

CORRELATION  ALGORITHM= 

3 

ASSOCIATIONS.DAT 

ELINT  PROCESSOR  ALGORITHM= 

2 

RELATI0NS_ELINT.DAT 

EXTERNALS  PROCESSOR  ALGORITHM= 

2 

RELATIONS.DAT 

EXTERNAL  SIMULATION 

FREQUENCY;  ERROR  TOLERANCE= 

0.001 

RELATIVE  WEIGHT^ 

100 

RADIO  TYPE 

RELATIVE  WEIGHT= 

LOCATION 

RELATIVE  V/EIGHT= 

DIRECTION 

RELATIVE  WEIGHT= 

MODULATION  TYPE 

RELATIVE  WEIGHT- 

100 

ELINT  SIMULATION 


ELINT  RELEVANT  TIME  THRESHOLD-  1200 

TYPE  RELATIVE  WEIGHT-  100 

ELINT  RELEVANT  DISTANCE  THRESHOLD-  1 

LOCATION  RELATIVE  WEIGHT-  100 

ELINT  LOCATION  ERROR  TOLERANCE-  3 

CORRELATION  WEIGHTS 


LANG 

SPK 

ACT 

PLAT 

INTERNAL  SIMULATION 

100 

100 

— 

100 

EXTERNALS  PROCESSOR 

100 

100 

100 

100 

ELINT  PROCESSOR 

100 

100 

100 

100 

OTHER  PARAMETERS 
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APPENDIX  B 


SAMPLE  OF  QUICK  EXPERIMENT  RESULTS 


SPEAKER  ID: 


Simulator:  speaker  correct 


84.86% 


64.94% 


5  EXTERNALS 


88.33% 


66.11% 


78.40% 


66.67% 


MEAN 


17.96 


STANDARD 

DEVIATION 


5.51 


LSP  COMBINATION: 

L _ — - 1 

LSP  correlated  correct 

83.27% 

88.33% 

78.38% 

STANDARD 

LSP  intemals  correct 

64.94% 

66.11% 

66.67% 

MEAN 

DEVIATION 

%  LSP  improvement 

18.33 

22.22 

11.71 

17.42 

5.31 

AVG  #  hypotheses 


CORRELATOR  ROUTING: 
completely  correct 


artially  correct 


efScien 


94.42%  t  100.00%  I  90.12% 


100.00% 

100.00% 

100.00% 

97.43% 

100.00% 

95.52% 

92.50% 

94.63% 

88.34% 

SPEAKER  ID: 


Correlator  hypo:  spkr.  corr. 


Simulator:  speaker  correct 


4  EXTERNALS 
DIR,  MOD,  RT,  LOC 


80.08% 


64.94% 


85.00% 


66.11% 


70.37% 


66.67% 


MEAN 


12.58 


STANDARD 

DEVIATION 


7.91 


LSP  COMBINATION: 

1  - ^  - 1 

LSP  correlated  correct 

78.49% 

76.67% 

69.75% 

STANDARD 

LSP  intemals  correct 

64.94% 

66.11% 

66.67% 

MEAN 

DEVIATION 

%  LSP  improvement 

13.55% 

10.56% 

3.08% 

9.06 

5.39 

AVG  #  hypotheses 


CORRELATOR  ROUTING: 
completely  correct 


efficien 


100.00% 

100.00% 

100.00% 

97.43% 

100.00% 

95.80% 

88.78% 

83.71% 

85.93% 

76 


SPEAKER  ID: 


kr.  corr. 


Simulator:  s 


%  speaker  improvement 


LSP  COMBINATION: 


LSP  correlated  correct 


LSP  internals  correct 


%  LSP  improvement 


AVG  #  hypotheses 


86.06% 


64.94% 


21.12 


85.26% 


64.94% 


20.32 


93.33% 


66.11% 


27.22 


93.33% 


66.11% 


27.22 


82.72% 


66.67% 


16.05 


82.72% 


66.67% 


16.05 


MEAN 


21.46 


STANDARD 

DEVIATION 


5.59 


STANDARD 
MEAN  I  DEVIATION 


21.20  I  5.64 


CORRELATOR  ROUTING: 


completely  correct 


artially  correct 


94.42% 

98.89% 

85.80% 

100.00% 

100.00% 

100.00% 

97.43% 

99.46% 

93.56% 

98.70% 

96.84% 

97.95% 
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_ ELINT  PROCESSOR _ 100  100  100  100 _ 

OTHER  PARAMETERS 

SCENARIO  #  1  [SCENARIO#  2  [SCENARIO# 

BETA=  3  |cUTOFF=  5  |  GAMMA- 
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APPENDIX  D 


EXPERIMENT  LIST 


EXPERIMENTS  WITH  ELINT 


CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 


/lOO/lOO/lOO 

/100/100/75 

/100/100/50 

/100/100/25 

/100/75/100 

/100/75/75 

/ 100/75/50 

/100/75/25 

/100/50/100 

/ 100/50/75 

/100/50/50 

/100/50/25 

/100/25/100 

/100/25/75 

/100/25/50 

/100/25/25 

/75/100/100 

/75/75/100 

/75/50/100 

/75/25/100 

/50/ 100/ 100 

/50/75/100 

/50/50/100 

/50/25/100 

/25/100/100 

/25/75/100 

/25/50/100 

/25/25/100 


CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 


/75/100/75 

/75/100/50 

/75/100/25 

/50/100/75 

/50/100/50 

/50/100/25 

/25/100/75 

/25/100/50 

/25/100/25 

/75/75/25 

/75/75/50 

/75/50/75 

/75/50/50 

/75/50/25 

/75/25/75 

/75/25/50 

/75/25/25 

/50/75/75 

/50/75/50 

/50/75/25 

/50/50/75 

/50/25/75 

/25/25/75 

/25/50/75 

/25/75/25 

/25/75/50 

/25/75/75 


EXPERIMENTS  WITHOUT  ELINT 


CW123  /lOO/lOO/O 
CW123  / 100/75/0 
CW123  /100/50/0 
CW123  / 100/25/0 
CW123  /75/100/0 
CW123  /50/100/0 


CW123  /25/100/0 
CW123  /75/50/0 
CW123  /75/25/0 
CW123  /50/75/0 
CW123  /25/75/0 


EXPERIMENTS  NOT  PERFORMED 


CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CWl'23 


/75/75/75 
/50/50/50 
/50/50/25 
/ 50/25/25 
/ 50/25/50 
/25/50/25 
/25/50/50 


CW123 

CW123 

CW123 

CW123 

CW123 

CW123 

CW123 


/25/25/25 

/25/25/50 

/50/25/0 

/25/50/0 

/75/75/0 

/50/50/0 

/25/25/0 


ROUTING  TABLE 
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S3:A3  6  /  L3;H2;A1 

:T3:A3  6  /  L3:U2:A1 


INTERNAL  SIBSULATIQN 
LANGUAGE  CORRECT 
SPEAKER  CORRECT 
PLATFORM  CORRECT 
LSP  CORRECT 

EXTERNAL  BASED 

ACTIVITY  CORRECT 


MERGE  ALGORITHM 
DIRECT  RELEVANCE 


LANGUAGE  CORRECT 

IMPROVEMENT  DUE  TO  MERGING 

SPEAKER  CORRECT 

IMPROVEMENT  DUE  TO  MERGING 

ACTIVITY  CORRECT 

IMPROVEMENT  DUE  TO  MERGING 

PLATFORM  CORRECT 

IMPROVEMENT  DUE  TO  MERGING 

CORRELATOR  HYPOTHESES 
INDIRECT  RELEVANCE 


TOP  HYPOTHESIS  CORRECT 

ANY  HYPOTHESIS  CORRECT 

AVERAGE  #  HYPOTHESES 

NORMALIZED  ANY  HYPOT.  CORRECT 

LANGUAGE  CORRECT 

IMPROVEMENT  DUE  TO  CORRELATION 

SPEAKER  CORRECT 

IMPROVEMENT  DUE  .fO  CORRELATION 

ACTIVITY  CORRECT 

IMPROVEMENT  DUE  TO  CORRELATION 

PLATFORM  CORRECT 

IMPROVEMENT  DUE  TO  CORRELATION 

LSP  CORRECT 

IMPROVEMENT  DUE  TO  CORRELATION 

CORRELATOR  ROUTING 

_ INDIRECT  RELEVANCE 

COMPLETELY  CORRECT _ _ 

PARTIALLY  CORRECT _ _ 

ACCURACY 


SCENARIO  GROUP  (1^,3) 


SCENARIO  GROUP  (1^,3) 

75/50/25 

75/50/0 

100/ 100/25 

O/OOI/OOI 

75/100/25 

75/75/25 

75/  100/0 

75/100/50 

100/50/25 

MERGE  AU30R1THM 

■■■■ 

■■nm 

■■■■1 

HHHIi 

■IH 

LANGUAGE  IMPROVEMENT 

28.65 

27.76 

B'i'ltrBI 

27.80 

29.41 

SPEAKER  IMPROVEMENT 

36.61 

36.20 

36.48 

HESBH 

ACTIVrrY  IMPROVEMENT 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

PLATFORM  IMPROVEMENT 

27.03 

26.81 

26.60 

26.75 

26.88 

26.06 

26.27 

'27.72 

25.65 

TOTAL 

91. S3 

90.71 

92.49 

90.26 

91.22 

91.36 

90.42 

91.74 

RANK 

5.00 

4.00 

12.00 

2.00 

9.00 

8.00 

14.00 

7.00 

CORRELATION  ALGORfEHM 

TOP  HYPOTHESIS  CORRECT 

Knim 

88.08 

86.58 

86.26 

86.73 

86.90 

[NORMAL.  ANY  HYP  CORRECT 

■EtKlcW 

tl.29 

50.92 

42.93 

■.l.l.LM 

49.46 

50.35 

42.28 

imiiiii 

■■■■ 

■■■■ 

■■■■ 

■h^h 

^hhh 

LANGUAGE  IMPROVEMENT 

27.24 

SPEAKER  IMPROVEMENT 

30.03 

33.46 

29.85 

31.08 

ACTIVITY  IMPROVEMENT 

-2.5^ 

-2.16 

-2.01 

-2.78 

■BSH 

-2.83 

BHEQI 

PLATFORM  IMPROVEMENT 

23.4  1 

23.27 

22.47 

LSP  IMPROVEMENT 

60.37 

59,98 

59.74 

58.37 

57.52 

58.90 

58.45 

ROUTING 

HHHI 

HHHH 

HHHH 

HH^HII 

COMPLETELY  CORRECT 

97.43 

97.55 

97.44 

97.43 

97.55 

hthi^hI 

PARTIALLY  CORRECT 

97.04 

97.19 

97.05 

97.20 

97.04 

97.19 

97.19 

97.19 

97.19 

ACCURACY 

97.04 

97.19 

97.05 

97.20 

97.04 

97.19 

97.19 

97.19 

97.19 

EFFICIENCY 

83.42 

89.80 

89.31 

89.52 

89.94 

89.32 

89.59 

88.41 

88.53 

TOTAL 

652.86 

650.01 

657.98 

645.48 

671.07 

650.48 

KtSW 

K'l-JKB 

645.91 

RANK 

5.00 

9.00 

2.00 

13.00 

1.00 

7.00 

4.00 

HHHI 

HHHI 

HUH 

HHHIi 

hhhh 

HHHH 

GRAND  TOTAL 

1295.67 

1292.93 

1302.88 

1288.99 

1288.67 

1288.08 

1288.11 

RANK  FOR  GRAND  TOTAL 

2.00 

3.00 

4,00 

4.00 

1.00 

6.00 

7.00 

9.00 

8.00 

TOTAL  RANK 

10.00 

beesb 

14.00 

15.00 

16.00 

16.00 

18.00 

18.00 

19.00 

SCENARIO  GROUP  (4,5,6) 

75/75/25 

75/25/50 

100/50/25 

100/75/75 

50/50/75 

100/50/ 100 

75/50/50 

100/50/75 

75/50/75 

MERGE  ALGORITHM 

bbhi 

HHHH 

LANGUAGE  IMPROVEMENT 

IHBH 

28.50 

28.32 

26.29 

26.80 

27.04 

WEBSM 

H^EHI 

27.58 

SPEAKER  IMPROVEMENT 

35.96 

36.19 

36.09 

34.09 

34.10 

ACTIVITY  IMPROVEMENT 

5.68 

5.68 

5.68 

5.68 

5.68 

5.68 

5.68 

5.68 

5.68 

PLATFORM  IMPROVEMENT 

HHHH 

21.79 

21.65 

IHBH 

HESEH 

22.90 

19.88 

■SBi 

TOTAL 

91.27 

■SBOi 

87.27 

87.16 

90.34 

88.03 

90.44 

86.62 

RANK 

4.CX) 

1.00 

3.00 

18.00 

19.00 

9.00 

16.00 

8.00 

21.00 

CORRELATION  ALGORITHM 

TOP  HYPOTHESIS  CORRECT 

87.36 

85.79 

86.33 

89.88 

86.12 

83.69 

85.48 

79.82 

KEESH 

iNORMAL.  ANY  HYP  CORRECT 

35.00 

35.25 

48.43 

43.90 

35.66 

39.20 

36.90 

40.09 

HHH 

HHHI 

h^^h 

mu 

HHHI 

ibbhi 

HHHH 

LANGUAGE  IMPROVEMENT 

26.06 

HHESH 

25.18 

23.80 

24.48 

HSHH 

24.84 

HHHH 

SPEAKER  IMPROVEMENT 

29.88 

29.08 

WEnBm 

KSDI 

HE3BH 

26.46 

28.26 

■BBH 

Hl^lHi 

ACTIVITY  IMPROVEMENT 

12.47 

9.63 

12.16 

14.13 

12.63 

9.00 

10.70 

3.61 

9.08 

PLATFORM  IMPROVEMENT 

24.36 

21.75 

23.73 

24.94 

23.21 

II^S^H 

22.42 

20.38 

19.90 

LSP  IMPROVEMENT 

HHSH 

56.19 

I^HEsH 

HSSIHH 

57.02 

54.48 

55.52 

49.44 

ROUTING 

HHHI 

HHHH 

HHHH 

^hih 

HHHH 

Hi^^H 

COMPLETELY  CORRECT 

99.58 

99.26 

99.34 

HHESH 

hshh 

99.19 

IH3EH 

HHHH 

PARTIALLY  CORRECT 

99.16 

99.26 

98.45 

98.72 

99.09 

99.09 

99.10 

98.99 

ACCURACY 

99.53 

99.16 

99.26 

98.45 

98.72 

99.09 

99.09 

99.10 

98.99 

EFFICIENCY 

53.40 

55.1 1 

48.00 

80.95 

70.25 

50.20 

46.53 

54.67 

65.09 

TOTAL 

627.85 

616.05 

613.23 

671.62 

il^Sl^SI 

610.31 

588.02 

WJm.vMi 

RANK 

7.00 

10.00 

11.00 

1.00 

3.00 

16.00 

13.00 

21.00 

9.00 

bhih 

HHHH 

HHHI 

IBHH 

HHHH 

GRAND  TOTAL 

1261.21 

1255.39 

Iktkiy-l 

1254.19 

1234.96 

1226.50 

RANK  FOR  GRAND  TOTAL 

3.00 

4.00 

l.OO 

5.00 

6.00 

9.00 

10,00 

7.00 

TOTAL  RANK 

■eqbh 

11.00 

14.00 

19.00 

21.00 

25.00 

29.00 

29.00 

ibsbsi 
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MERGE  ALGORrrHM 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


TOTAL 


RAWK 


CORRELATION  ALGORITHM 


TOP  HYPOTHESIS  CORRECT 


NORMAL.  ANY  HYP  CORRE 


0.00 


26.27  26.53 


93.20  88.77 


1.00  23.00 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


LSP  IMPROVEMENT 


ROUTING 


COMPLETELY  CORRECT 


PARTIALLY  CORRECT 


ACCURACY 


EFFICIENCY 


TOTAL 


RANK 


GRAND  TOTAL 


RANK  FOR  GRAND  TOTAL 


TOTAL  RANK 


SCENARIO  GROUP  (4,5,6) 


MERGE  ALGORITHM 


LANGUAGE  IMPROVEMENT  26.46 


SPEAKER  IMPROVEMENT  36.23 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


CORRELATION  ALGORITHM 


TOP  HYPOTHESIS  CORRECT 


NORMAL.  ANY  HYP  CORREC 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


LSP  IMPROVEMENT 


ROUTING 


COMPLETELY  CORRECT 


PARTIALLY  CORRECT 


72.44 


75.13  81.76  73,13 


34.67 


1221.03  1226.70 


RANK  FOR  GRAND  TOTAL 


1196.89  1197.98  1192.28  1189.79 


14.00  17.00 


31.00  I  32.00 
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MERGE  ALGORfTHM 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


CORRELATION  ALGORITHM 


TOP  HYPOTHESIS  CORRECT 


NORMAL.  ANY  HYP  CORRE 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


LSP  IMPROVEMENT 


ROUTING 


COMPLETELY  CORRECT 


PARTIALLY  CORRECT 


ACCURACY 


EFFICIENCY 


TOTAL 


RANK 


GRAND  TOTAL 


RANK  FOR  GRAND  TOTAL 


TOTAL  RANK 


SCENARIO  GROUP  (4,5,6) 


MERGE  ALGORITHM 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


CORRELATION  ALGORITHM 


TOP  HYPOTHESIS  CORRECT 


NORMAL.  ANY  HYP  CORRE 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


LSP  IMPROVEMENT 


ROUTING 


COMPLETELY  CORRECT 


PARTIALLY  CORRECT 


ACCURACY 


EFFICIENCY 


GRAND  TOTAL  1187.24 


RANK  FOR  GRAND  TOTAL  19.00 


TOTAL  RANK  45.00 


MERGE  ALGORITHM 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


TOTAL 


RAIfK 


CORRELATION  ALGORITHM 


TOP  HYPOTHESIS  CORRECT 


NORMAL.  ANY  HYP  CORRE 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


LSP  IMPROVEMENT 


ROUTING 


COMPLETELY  CORRECT 


PARTIALLY  CORRECT 


ACCURACY 


EFFICIENCY 


TOTAL 


RAJIK 


RANK  FOR  GRAND  TOTAL 


TOTAL  RANK 


SCENARIO  GROtn*  (4,5,6) 


0.00 


23.36  23.05 


MERGE  ALGORITHM 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACrrVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


CORRELATION  ALGORITHM 


TOP  HYPOTHESIS  CORRECT 


NORMAL.  ANY  HYP  CORRE 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


LSP  IMPROVEMENT 


ROUTING 


COMPLETELY  CORRECT 


PARTIALLY  CORRECF 


ACCURACY 


EFFICIENCY 


TOTAL 


GRAND  TOTAL 


RANK  FOR  GRAND  TOTAL 


1143.44 


29.00 


1129.29 


35.00 


62.00 


1139.21 


31.00 


64.00 


1132.10 


34.00 


65.00 


68.00  70.00 


COMPLETELY  CORRECT 


PARTIALLY  CORRECT 


97,18  1 

97,19 

97.18  1 

97.19 

84.68 

87.19 

616.85 

633.73 

41.00 

25.00 

98.12 

97.44 

97.84 

97.05 

97.84 

97.05 

84.05  i 

87.47 

606.08 

621.24 

46.00 

36.00 

SCENAKIO  GROUP  (4,5,6) 
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QRAWD  TOTAL 


RAKK  FOR  GRAND  TOTAL 


TOTAL  RANK 


SCENARIO  GROUP  (4,5,6) 


_  MERGE  ALGORrrHM 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


CORRELATION  ALGORITHM 


TOP  HYPOTHESIS  CORRECT 


NORMAL.  ANY  HYP  CORRE 


SPEAKER  IMPROVEMENT 


ACTrvlTY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


LSP  IMPROVEMENT 


COMPLETTELY  CORRECT 


PARTIALLY  CORRECT 


ACCURACY 


EFFICIENCY 


GRAND  TOTAL 


rank  for  grand  total 


1024.86 


47.00 


97.00 


1003.70 


49.00 


97.00 


SCEXARIO  GROUP  |l;2,3) 


SCEICiLRIO  GROUP  (7,8,9) 


SCENARIO  GROUP  (7,8,9) 
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GRAKD  TOTAL 


RAJn<  FOR  GRAJiD  TOTAL 


TOTAL  RANK 


MERGE  ALGORITHM 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTIVrrY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


TOTAL 


CORRELATION  ALGORITHM 


TOP  HYPOTHESIS  CORRECT 


NORMAL.  ANY  HYP  CORRE 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTrVTTY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


LSP  IMPROVEMENT 


ROUTING 


COMPLETELY  CORRECT 


PARTIALLY  CORRECT 


ACCURACY 


EFFICIENCY 


TOTAL 


RANK 


QRAND  TOTAL 


RANK  FOR  GRAND  TOTAL 


TOTAL  RANK 


REPRODUCIBLS  (1»2,3) 


MERGE  ALGORITHM 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


0.00 


22.78  22.91  26.38 


CORRELATION  ALGORITHM 


TOP  HYPOTHESIS  CORRECT 


NORMAL.  ANY  HYP  CORRE 


82.24  80.95 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


LSP  IMPROVEMENT 


ROUTING 


COMPLETELY  CORRECT 


PARTIALLY  CORRECT 


ACCURACY 


EFFICIENCY 


TOTAL 


24.97  I  25.10 


RANK  FOR  QRAND  TOTAL 


TOTAL  RANK 


72.00  I  73.00 
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GRAND  TOTAL 


RANK  FOR  GRAND  TOTAL 


TOTAL  RANK 


93 


SCKHARIO  GROUP  (7»8,9) 


MERGE  ALGORITHM 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


TOTAL 


RANK 


CORRELATION  ALGORITHM 


TOP  HYPOTHESIS  CORRECT 


NORMAL.  ANY  HYP  CORRE 


aui 


35.05  29.92 


LANGUAGE  IMPROVEMENT  I 


SPEAKER  IMPRGVEMENT  I 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT  I 


LSP  IMPROVEMENT 


ROUTING 


COMPLETELY  CORRECT 


PARTIALLY  CORRECT 


ACCURACY 


EFFICIENCY 


TOTAL 


RANK 


11.31 

20.84  ! 

15.52 

16.67 

-4.63 

-4.63 

0.39 

■BI91 

■EDESI 

1  93.18 

95.66 

:  93.18 

96.03 

95.66 

63.63 

62.56 

65.01 

63.07 

447.19 

59.00 

48.00  I 
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GRAND  TOTAL 


RANK  FOR  GRAND  TOTAL 


TOTAL  RANK 


REPRODUCIBLE  (1,2,3) 


MERGE  ALGORITHM 


LANGUAGE  IMPROVEMENT 


SPEAKER  IMPROVEMENT 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


CORRELATION  ALGORITHM 


TOP  HYPOTHESIS  CORRECT 


NORMAL.  ANY  HYP  CORRE 


SUl 


LANGUAGE  IMPROVEMENT  I 


SPEAKER  IMPROVEMENT 


ACTIVITY  IMPROVEMENT 


PLATFORM  IMPROVEMENT 


LSP  IMPROVEMENT  I 


ROUTING 


COMPLETELY  CORRECT 


PARTIALLY  CORRECT 


ACCURACY 


EFFICIENCY 


25.74  21.11 


19.31 


-4.76 


15.44  18.02 


48.64  48.51 


ITHTMI 


QRAKD  TOTAL 


RANK  FOR  GRAND  TOTAL 


TOTAL  RANK 


94 


95 


MERGE  ALGORITHM 


LANGUAGE  IMPROVEMENT  I  27.80  I  27.76  I  28.65  I  28.61  I  28.96  I  27.37  I  27.69 


SPEAKER  IMPROVEMENT  35.59  36.35  36.77  36.48  36.20  35.34  35.60 


ACTIVITY  IMPROVEMENT  0.00  0.00  0.00  0.00  0.00  0.00  0.00 


PLATFORM  IMPROVEMENT  26.88  26.60  26.81  26.27  26.06  27.72  26.09 


TOTAL  90.26  90.71  92.23  91.36  91.22  90.42  89.37 


RANK  I  15.00  I  12.00  |  4.00  |  8.00  |  9.00  |  14.00  20.00 


CORRELATION  ALGORITHM 


TOP  HYPOTHESIS  CORRECT  _  90.39  88.08  88.48  86.26  I  86.58  I  86.73  I  86.96 


NORMALIZ.  ANY  HYP  CORRECT  55.59  50.92  41.29  49.39  49.46  50.35  52.20 


_ TOTAL  145.98  139.01  129.77  135.65  136.04  137.08  1139.16 


RANK  I  1.00  I  9.00  |  30.00  |  15.00  |  14.00  12.00  8.00 


LANGUAGE  IMPROVEMENT 

25.97 

25.04 

25.77 

24.96 

25.69 

25.19 

23.96 

SPEAKER  IMPROVEMENT 

33.46 

32.09 

31.51 

29.88 

29.85 

31.08 

30.84 

ACTIVITY  IMPROVEMENT 

-1.68 

-2.01 

-2.16 

-2.83 

-2.43 

-1.84 

-2.76 

PLATFORM  IMPROVEMENT 

24.24 

23.27 

23.41 

22.47 

22.12 

24.38 

22.05 

LSP  IMPROVEMENT 

61.67 

59.74 

59.98 

57.52 

57.95 

58.90 

57.74 

TOTAL 

143.65 

138.14 

138.51 

131.99 

133.19 

137.71 

131.82 

RANK 

1.00 

4.00 

3.00 

14.00 

13.00 

5.00 

15.00 

ROUTING 


COMPLETELY  CORRECT 


PARTIALLY  CORRECT 


ACCURACY 


EFFICIENCY 


TOTAL]  381.45  j  380.84  j  381.73  j  381.52  j  381.25  j  380.34  |  381.53 


GRAND  TOTAL 


RANK  FOR  GRAND  TOTAL  1.00  2.00 


TOTAL  RANK  I  22.00  36.00 
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Rome  Laboratory 
Customer  Satisfaction  Survey 


RL-TR- _ 

Please  complete  this  survey,  and  mail  to  RL/IMPS, 

26  Electronic  Pky,  Griff iss  AFB  NY  13441-4514.  Your  assessment  and 
feedback  regarding  this  technical  report  will  allow  Rome  Laboratory 
to  have  a  vehicle  to  continuously  improve  our  methods  of  research, 
publication,  and  customer  satisfaction.  Your  assistance  is  greatly 
appreciated. 

Thank  You 


Organization  Name: 
Organization  POC: 


Address : 


1.  On  a  scale  of  1  to  5  how  would  you  rate  the  technology 
developed  under  this  research? 

5-Extremely  Useful  1-Not  Useful/Wasteful 

Rating _ 

Please  use  the  space  below  to  comment  on  your  rating.  Please 
suggest  improvements.  Use  the  back  of  this  sheet  if  necessary. 


(Optional) 

(Optional) 


2.  Do  any  specific  areas  of  the  report  stand  out  as  exceptional? 

Yes _  No _ 

If  yes,  please  identify  the  area(s),  and  comment  on  what 
aspects  make  them  "stand  out." 


3.  Do  any  specific  areas  of  the  report  stand  out  as  inferior? 

Yes _  No _ 

If  yes,  please  identify  the  area(s),  and  comment  on  what 
aspects  make  them  ''stand  out." 

4.  Please  utilize  the  space  below  to  comment  on  any  other  aspects 
of  the  report.  Comments  on  both  technical  content  and  reporting 
format  are  desired. 


